BestERP

Goliath 120B

overall
0.7
(13)
nsfw
0.6
(7)
role playing
0.8
(6)
story writing
0.8
(5)

Overview

Merge of Xwin and Euryale, particularly good at story-telling and role-play.

Benchmarks

Here's a summary of a benchmark by the great WolframRavenwolf. Goliath dominated this benchmark, specifically this RP oriented quant.

  • 👍 = +1
  • ➕ = +0.5
  • ➖ = -0.5
  • ❌ = -1

Amy, official Vicuna 1.1 format:

  • 👍 Average Response Length: 294 (within my max new tokens limit of 300)
  • 👍 Excellent writing, detailed action descriptions, amazing attention to detail
  • 👍 Finally a model that exhibits a real sense of humor through puns and wordplay as stated in the character card
  • 👍 Finally a model that uses colorful language and cusses as stated in the character card
  • 👍 Gave very creative (and uncensored) suggestions of what to do (even suggesting some of my actual limit-testing scenarios)
  • 👍 Novel ideas and engaging writing, made me want to read on what happens next, even though I've gone through this test scenario so many times already No emojis at all (only one in the greeting message)
  • ➖ Suggested things going against her background/character description
  • ➖ Spelling/grammar mistakes (e. g. "nippleless nipples")

Amy, Roleplay preset:

  • 👍 Average Response Length: 223 (within my max new tokens limit of 300)
  • 👍 Excellent writing, detailed action descriptions, amazing attention to detail
  • 👍 Finally a model that exhibits a real sense of humor through puns and wordplay as stated in the character card
  • 👍 Gave very creative (and uncensored) suggestions of what to do (even suggesting some of my actual limit-testing scenarios) No emojis at all (only one in the greeting message)

MGHC, official Vicuna 1.1 format:

  • 👍 Only model that considered the payment aspect of the scenario
  • 👍 Believable reactions and engaging writing, made me want to read on what happens next, even though I've gone through this test scenario so many times already
  • ➕ Very unique patients (one I never saw before)
  • ➖ Gave analysis on its own, but also after most messages, and later included Doctor's inner thoughts instead of the patient's
  • ➖ Spelling/grammar mistakes (properly spelled words, but in the wrong places)

MGHC, Roleplay preset:

  • 👍 Believable reactions and engaging writing, made me want to read on what happens next, even though I've gone through this test scenario so many times already
  • 👍 Excellent writing, detailed action descriptions, amazing attention to detail
  • ➖ No analysis on its own
  • ➖ Spelling/grammar mistakes (e. g. "loufeelings", "earrange")
  • ➖ Third patient was same species as the first

Quantized versions

Submit your review

2.5 Not good

Pick which version of the model you base your review on.

Reviews

4.0

Very good at ERP from my experience. Descriptive and relatively creative, but still lacking in some areas.

4.5

Best local model I've ever used. +Has sense of humor(!!!), this is the first model that joked unprompted +Very creative +Very smart +Juicy -Has some GPTisms in it. If you spot one, remove ASAP, otherwise the model will keep spitting them out -Has sometimes troubles with following the prompt, solved by editing

2.0

From my experience using it more, it clearly has significant flaws others tend to overlook. If you are constantly having to reroll or edit with 70B+, something is wrong, and unfortunately this model tends to face such issues. There is likely an over focusing on isolated examples of prose and not the overall consistency, accuracy, and coherency of the model, especially as context grows.

  • Spelling errors
  • Grammar errors
  • Formatting errors
  • Picking incorrect tokens and words
  • Token probabilities collapsing and causing infinite looping output
  • Confusing similar concepts, i.e. will use the incorrect token that is related, e.x. brother vs sister
  • Mixing up the user and the character's details and personality
  • Hallucinating details

While it can produce some creative outputs, this likely is due to token probabilities being unstable due to how it was merged or stitched together, which also likely causes a lot of the issues mentioned.

All in all, getting creative good prose some of the time cannot off set these fairly constant issues for, especially given the massive size of the model and how slow it is compared to alternatives that do not have these issues.

4.5
quant: gguf q8

Goliath is oustanding in it's ability to handle complicated characters with a high level of coherency. It often follows a train of thought well and very rarely deviates from instructions (given that they are provided in the correct format). It does lack in it's inherent ability to write good NSFW content, often providing top-level summaries of what's happening.

5.0

It depicts a beautiful world without the preachy feel of GPT. Excellent understanding and no jailbreak required. Large models may be greatly affected by quantization. Someone hosted this on Horde a long time ago and there were no errors or grammar mistakes. I had a wonderful time for about a week. Nowadays, I almost have no choice but to use that API, but I feel that the performance is a little lower than it was then. I don't know if it's a difference in some settings or an effect of quantization. Maybe it's just my imagination. However, if you swipe it a few times, it will still give you the kind of depiction that only this model can do. I have several characters that I created myself that I only use with this model. I only use it when I want to see them. It would be a waste to use it for ERP. Simply because the usage fee is high.

0.5
quant: gguf q5

This is just another garbage merge model. So many 13B and 7B role-play models beat this for many reasons.

This always messes up its spelling, or goes off topic / changes my role-play topic. It's quite annoying.