leaderboard-pr-bot commited on
Commit
15a509d
1 Parent(s): 2e865c1

Adding Evaluation Results

Browse files

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show
  1. README.md +137 -34
README.md CHANGED
@@ -1,4 +1,6 @@
1
  ---
 
 
2
  license: mit
3
  datasets:
4
  - pints-ai/Expository-Prose-V1
@@ -10,40 +12,21 @@ datasets:
10
  - togethercomputer/llama-instruct
11
  - LDJnr/Capybara
12
  - HuggingFaceH4/ultrafeedback_binarized
13
- language:
14
- - en
15
- model-index:
16
- - name: 1.5-Pints
17
- results:
18
- - task:
19
- type: text-generation
20
- dataset:
21
- name: MTBench
22
- type: ai2_arc
23
- metrics:
24
- - name: MTBench
25
- type: LLM-as-a-Judge
26
- value: 3.73
27
- source:
28
- name: MTBench
29
- url: https://huggingface.co/spaces/lmsys/mt-bench
30
  pipeline_tag: text-generation
31
- extra_gated_prompt: >-
32
- Though best efforts has been made to ensure, as much as possible, that all
33
- texts in the training corpora are royalty free, this does not constitute a
34
- legal guarantee that such is the case. **By using any of the models, corpora
35
- or part thereof, the user agrees to bear full responsibility to do the
36
- necessary due diligence to ensure that he / she is in compliance with their
37
- local copyright laws. Additionally, the user agrees to bear any damages
38
- arising as a direct cause (or otherwise) of using any artifacts released by
39
- the pints research team, as well as full responsibility for the consequences
40
- of his / her usage (or implementation) of any such released artifacts. The
41
- user also indemnifies Pints Research Team (and any of its members or agents)
42
- of any damage, related or unrelated, to the release or subsequent usage of any
43
- findings, artifacts or code by the team. For the avoidance of doubt, any
44
- artifacts released by the Pints Research team are done so in accordance with
45
- the 'fair use' clause of Copyright Law, in hopes that this will aid the
46
- research community in bringing LLMs to the next frontier.
47
  extra_gated_fields:
48
  Company: text
49
  Country: country
@@ -56,6 +39,113 @@ extra_gated_fields:
56
  - label: Other
57
  value: other
58
  I agree to use this model for in accordance to the afore-mentioned Terms of Use: checkbox
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
59
  ---
60
 
61
  # 1.5-Pints -- A model pretrained in 9 days by using high quality data
@@ -257,4 +347,17 @@ Though best efforts has been made to ensure, as much as possible, that all texts
257
 
258
  Additionally, the **user agrees to bear any damages** arising as a direct cause (or otherwise) of using any artifacts released by the pints research team, as well as full responsibility for the consequences of his / her usage (or implementation) of any such released artifacts. The user also indemnifies Pints Research Team (and any of its members or agents) of any damage, related or unrelated, to the release or subsequent usage of any findings, artifacts or code by the team.
259
 
260
- For the avoidance of doubt, **any artifacts released by the Pints Research team are done so in accordance with the "fair use"** clause of Copyright Law, in hopes that this will aid the research community in bringing LLMs to the next frontier.
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
+ language:
3
+ - en
4
  license: mit
5
  datasets:
6
  - pints-ai/Expository-Prose-V1
 
12
  - togethercomputer/llama-instruct
13
  - LDJnr/Capybara
14
  - HuggingFaceH4/ultrafeedback_binarized
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
15
  pipeline_tag: text-generation
16
+ extra_gated_prompt: Though best efforts has been made to ensure, as much as possible,
17
+ that all texts in the training corpora are royalty free, this does not constitute
18
+ a legal guarantee that such is the case. **By using any of the models, corpora or
19
+ part thereof, the user agrees to bear full responsibility to do the necessary due
20
+ diligence to ensure that he / she is in compliance with their local copyright laws.
21
+ Additionally, the user agrees to bear any damages arising as a direct cause (or
22
+ otherwise) of using any artifacts released by the pints research team, as well as
23
+ full responsibility for the consequences of his / her usage (or implementation)
24
+ of any such released artifacts. The user also indemnifies Pints Research Team (and
25
+ any of its members or agents) of any damage, related or unrelated, to the release
26
+ or subsequent usage of any findings, artifacts or code by the team. For the avoidance
27
+ of doubt, any artifacts released by the Pints Research team are done so in accordance
28
+ with the 'fair use' clause of Copyright Law, in hopes that this will aid the research
29
+ community in bringing LLMs to the next frontier.
 
 
30
  extra_gated_fields:
31
  Company: text
32
  Country: country
 
39
  - label: Other
40
  value: other
41
  I agree to use this model for in accordance to the afore-mentioned Terms of Use: checkbox
42
+ model-index:
43
+ - name: 1.5-Pints
44
+ results:
45
+ - task:
46
+ type: text-generation
47
+ dataset:
48
+ name: MTBench
49
+ type: ai2_arc
50
+ metrics:
51
+ - type: LLM-as-a-Judge
52
+ value: 3.73
53
+ name: MTBench
54
+ source:
55
+ url: https://huggingface.co/spaces/lmsys/mt-bench
56
+ name: MTBench
57
+ - task:
58
+ type: text-generation
59
+ name: Text Generation
60
+ dataset:
61
+ name: IFEval (0-Shot)
62
+ type: HuggingFaceH4/ifeval
63
+ args:
64
+ num_few_shot: 0
65
+ metrics:
66
+ - type: inst_level_strict_acc and prompt_level_strict_acc
67
+ value: 17.62
68
+ name: strict accuracy
69
+ source:
70
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=pints-ai/1.5-Pints-2K-v0.1
71
+ name: Open LLM Leaderboard
72
+ - task:
73
+ type: text-generation
74
+ name: Text Generation
75
+ dataset:
76
+ name: BBH (3-Shot)
77
+ type: BBH
78
+ args:
79
+ num_few_shot: 3
80
+ metrics:
81
+ - type: acc_norm
82
+ value: 2.37
83
+ name: normalized accuracy
84
+ source:
85
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=pints-ai/1.5-Pints-2K-v0.1
86
+ name: Open LLM Leaderboard
87
+ - task:
88
+ type: text-generation
89
+ name: Text Generation
90
+ dataset:
91
+ name: MATH Lvl 5 (4-Shot)
92
+ type: hendrycks/competition_math
93
+ args:
94
+ num_few_shot: 4
95
+ metrics:
96
+ - type: exact_match
97
+ value: 0.0
98
+ name: exact match
99
+ source:
100
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=pints-ai/1.5-Pints-2K-v0.1
101
+ name: Open LLM Leaderboard
102
+ - task:
103
+ type: text-generation
104
+ name: Text Generation
105
+ dataset:
106
+ name: GPQA (0-shot)
107
+ type: Idavidrein/gpqa
108
+ args:
109
+ num_few_shot: 0
110
+ metrics:
111
+ - type: acc_norm
112
+ value: 0.0
113
+ name: acc_norm
114
+ source:
115
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=pints-ai/1.5-Pints-2K-v0.1
116
+ name: Open LLM Leaderboard
117
+ - task:
118
+ type: text-generation
119
+ name: Text Generation
120
+ dataset:
121
+ name: MuSR (0-shot)
122
+ type: TAUR-Lab/MuSR
123
+ args:
124
+ num_few_shot: 0
125
+ metrics:
126
+ - type: acc_norm
127
+ value: 1.84
128
+ name: acc_norm
129
+ source:
130
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=pints-ai/1.5-Pints-2K-v0.1
131
+ name: Open LLM Leaderboard
132
+ - task:
133
+ type: text-generation
134
+ name: Text Generation
135
+ dataset:
136
+ name: MMLU-PRO (5-shot)
137
+ type: TIGER-Lab/MMLU-Pro
138
+ config: main
139
+ split: test
140
+ args:
141
+ num_few_shot: 5
142
+ metrics:
143
+ - type: acc
144
+ value: 1.15
145
+ name: accuracy
146
+ source:
147
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=pints-ai/1.5-Pints-2K-v0.1
148
+ name: Open LLM Leaderboard
149
  ---
150
 
151
  # 1.5-Pints -- A model pretrained in 9 days by using high quality data
 
347
 
348
  Additionally, the **user agrees to bear any damages** arising as a direct cause (or otherwise) of using any artifacts released by the pints research team, as well as full responsibility for the consequences of his / her usage (or implementation) of any such released artifacts. The user also indemnifies Pints Research Team (and any of its members or agents) of any damage, related or unrelated, to the release or subsequent usage of any findings, artifacts or code by the team.
349
 
350
+ For the avoidance of doubt, **any artifacts released by the Pints Research team are done so in accordance with the "fair use"** clause of Copyright Law, in hopes that this will aid the research community in bringing LLMs to the next frontier.
351
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
352
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_pints-ai__1.5-Pints-2K-v0.1)
353
+
354
+ | Metric |Value|
355
+ |-------------------|----:|
356
+ |Avg. | 3.83|
357
+ |IFEval (0-Shot) |17.62|
358
+ |BBH (3-Shot) | 2.37|
359
+ |MATH Lvl 5 (4-Shot)| 0.00|
360
+ |GPQA (0-shot) | 0.00|
361
+ |MuSR (0-shot) | 1.84|
362
+ |MMLU-PRO (5-shot) | 1.15|
363
+