Use Alpaca format.
Intellect comes close to 30b models with 4k context
benchmark says it all.
| Metric | Value |
|---|---|
| Avg. | 51.29 |
| ARC (25-shot) | 63.23 |
| HellaSwag (10-shot) | 85.33 |
| MMLU (5-shot) | 57.36 |
| TruthfulQA (0-shot) | 51.65 |
| Winogrande (5-shot) | 76.09 |
| GSM8K (5-shot) | 10.92 |
| DROP (3-shot) | 14.46 |