Update README.md
Browse files
README.md
CHANGED
@@ -15,12 +15,13 @@ Code: [https://github.com/TIGER-AI-Lab/MAmmoTH2](https://github.com/TIGER-AI-Lab
|
|
15 |
## Introduction
|
16 |
Introducing 🦣 MAmmoTH2, a game-changer in improving the reasoning abilities of large language models (LLMs) through innovative instruction tuning. By efficiently harvesting 10 million instruction-response pairs from the pre-training web corpus, we've developed MAmmoTH2 models that significantly boost performance on reasoning benchmarks. For instance, MAmmoTH2-7B (Mistral) sees its performance soar from 11% to 34% on MATH and from 36% to 67% on GSM8K, all without training on any domain-specific data. Further training on public instruction tuning datasets yields MAmmoTH2-Plus, setting new standards in reasoning and chatbot benchmarks. Our work presents a cost-effective approach to acquiring large-scale, high-quality instruction data, offering a fresh perspective on enhancing LLM reasoning abilities.
|
17 |
| | **Base Model** | **MAmmoTH2** | **MAmmoTH2-Plus** |
|
18 |
-
|
19 |
| 7B | Mistral | 🦣 [MAmmoTH2-7B](https://huggingface.co/TIGER-Lab/MAmmoTH2-7B) | 🦣 [MAmmoTH2-7B-Plus](https://huggingface.co/TIGER-Lab/MAmmoTH2-7B-Plus) |
|
20 |
| 8B | Llama-3 | 🦣 [MAmmoTH2-8B](https://huggingface.co/TIGER-Lab/MAmmoTH2-8B) | 🦣 [MAmmoTH2-8B-Plus](https://huggingface.co/TIGER-Lab/MAmmoTH2-8B-Plus) |
|
21 |
| 8x7B | Mixtral | 🦣 [MAmmoTH2-8x7B](https://huggingface.co/TIGER-Lab/MAmmoTH2-8x7B) | 🦣 [MAmmoTH2-8x7B-Plus](https://huggingface.co/TIGER-Lab/MAmmoTH2-8x7B-Plus) |
|
22 |
## Training Data
|
23 |
-
|
|
|
24 |
![Project Framework](webinstruct.png)
|
25 |
|
26 |
## Training Procedure
|
@@ -31,7 +32,7 @@ The models are evaluated using open-ended and multiple-choice math problems from
|
|
31 |
|
32 |
|
33 |
| **Model** | **TheoremQA** | **MATH** | **GSM8K** | **GPQA** | **MMLU-ST** | **BBH** | **ARC-C** | **Avg** |
|
34 |
-
|
35 |
| **MAmmoTH2-7B** | 26.7 | 34.2 | 67.4 | 34.8 | 60.6 | 60.0 | 81.8 | 52.2 |
|
36 |
| **MAmmoTH2-8B** | 29.7 | 33.4 | 67.9 | 38.4 | 61.0 | 60.8 | 81.0 | 53.1 |
|
37 |
| **MAmmoTH2-8x7B** | 32.2 | 39.0 | 75.4 | 36.8 | 67.4 | 71.1 | 87.5 | 58.9 |
|
@@ -55,8 +56,8 @@ If you use the models, data, or code from this project, please cite the original
|
|
55 |
```
|
56 |
@article{yue2024mammoth2,
|
57 |
title={MAmmoTH2: Scaling Instructions from the Web},
|
58 |
-
author={Xiang
|
59 |
-
journal={arXiv preprint arXiv:2405.
|
60 |
year={2024}
|
61 |
}
|
62 |
```
|
|
|
15 |
## Introduction
|
16 |
Introducing 🦣 MAmmoTH2, a game-changer in improving the reasoning abilities of large language models (LLMs) through innovative instruction tuning. By efficiently harvesting 10 million instruction-response pairs from the pre-training web corpus, we've developed MAmmoTH2 models that significantly boost performance on reasoning benchmarks. For instance, MAmmoTH2-7B (Mistral) sees its performance soar from 11% to 34% on MATH and from 36% to 67% on GSM8K, all without training on any domain-specific data. Further training on public instruction tuning datasets yields MAmmoTH2-Plus, setting new standards in reasoning and chatbot benchmarks. Our work presents a cost-effective approach to acquiring large-scale, high-quality instruction data, offering a fresh perspective on enhancing LLM reasoning abilities.
|
17 |
| | **Base Model** | **MAmmoTH2** | **MAmmoTH2-Plus** |
|
18 |
+
|:-----|:---------------------|:-------------------------------------------------------------------|:------------------------------------------------------------------|
|
19 |
| 7B | Mistral | 🦣 [MAmmoTH2-7B](https://huggingface.co/TIGER-Lab/MAmmoTH2-7B) | 🦣 [MAmmoTH2-7B-Plus](https://huggingface.co/TIGER-Lab/MAmmoTH2-7B-Plus) |
|
20 |
| 8B | Llama-3 | 🦣 [MAmmoTH2-8B](https://huggingface.co/TIGER-Lab/MAmmoTH2-8B) | 🦣 [MAmmoTH2-8B-Plus](https://huggingface.co/TIGER-Lab/MAmmoTH2-8B-Plus) |
|
21 |
| 8x7B | Mixtral | 🦣 [MAmmoTH2-8x7B](https://huggingface.co/TIGER-Lab/MAmmoTH2-8x7B) | 🦣 [MAmmoTH2-8x7B-Plus](https://huggingface.co/TIGER-Lab/MAmmoTH2-8x7B-Plus) |
|
22 |
## Training Data
|
23 |
+
Please refer to https://huggingface.co/datasets/TIGER-Lab/WebInstructSub for more details.
|
24 |
+
|
25 |
![Project Framework](webinstruct.png)
|
26 |
|
27 |
## Training Procedure
|
|
|
32 |
|
33 |
|
34 |
| **Model** | **TheoremQA** | **MATH** | **GSM8K** | **GPQA** | **MMLU-ST** | **BBH** | **ARC-C** | **Avg** |
|
35 |
+
|:-----------------------|:--------------|:---------|:----------|:---------|:------------|:--------|:----------|:---------|
|
36 |
| **MAmmoTH2-7B** | 26.7 | 34.2 | 67.4 | 34.8 | 60.6 | 60.0 | 81.8 | 52.2 |
|
37 |
| **MAmmoTH2-8B** | 29.7 | 33.4 | 67.9 | 38.4 | 61.0 | 60.8 | 81.0 | 53.1 |
|
38 |
| **MAmmoTH2-8x7B** | 32.2 | 39.0 | 75.4 | 36.8 | 67.4 | 71.1 | 87.5 | 58.9 |
|
|
|
56 |
```
|
57 |
@article{yue2024mammoth2,
|
58 |
title={MAmmoTH2: Scaling Instructions from the Web},
|
59 |
+
author={Yue, Xiang and Zheng, Tuney and Zhang, Ge and Chen, Wenhu},
|
60 |
+
journal={arXiv preprint arXiv:2405.03548},
|
61 |
year={2024}
|
62 |
}
|
63 |
```
|