Prompt template is Orca-Vicuna:
SYSTEM: {system_message}
USER: {prompt}
ASSISTANT:
Being a Yi model, try disabling the BOS token and/or running a lower temperature with MinP (and no other samplers) if output doesn't seem right. Yi tends to run "hot" by default.
Sometimes the model "spells out" the stop token as </s>
like Capybara, so you may need to add </s>
as an additional stopping condition.
It might also respond to the llama-2 chat format.
used 4bpw exl2 quant
tested up to 24k context, stays coherent at least up to that much.
no limarp-isms which is great