notbdq leaderboard-pr-bot commited on
Commit
d2a8c99
1 Parent(s): e251847

Adding Evaluation Results (#1)

Browse files

- Adding Evaluation Results (4b672433b21196620948df6ea2acdb7f55b557f2)


Co-authored-by: Open LLM Leaderboard PR Bot <[email protected]>

Files changed (1) hide show
  1. README.md +118 -2
README.md CHANGED
@@ -1,9 +1,112 @@
1
  ---
2
- license: apache-2.0
3
  language:
4
  - tr
5
  - en
 
6
  library_name: transformers
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
7
  ---
8
  Fine tuned model of mistral-7b-instruct-v0-2. the dataset used for fine tuning is small and custom dataset for question answering in turkish language made by me. the main duty was to make model more adapted to turkish language.
9
  prompt format :
@@ -32,4 +135,17 @@ snake_size = 3
32
  class Snake:
33
  def __init__(self):
34
  ```
35
- [i cutted output for brevity]
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
 
2
  language:
3
  - tr
4
  - en
5
+ license: apache-2.0
6
  library_name: transformers
7
+ model-index:
8
+ - name: alooowso
9
+ results:
10
+ - task:
11
+ type: text-generation
12
+ name: Text Generation
13
+ dataset:
14
+ name: AI2 Reasoning Challenge (25-Shot)
15
+ type: ai2_arc
16
+ config: ARC-Challenge
17
+ split: test
18
+ args:
19
+ num_few_shot: 25
20
+ metrics:
21
+ - type: acc_norm
22
+ value: 62.97
23
+ name: normalized accuracy
24
+ source:
25
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=notbdq/alooowso
26
+ name: Open LLM Leaderboard
27
+ - task:
28
+ type: text-generation
29
+ name: Text Generation
30
+ dataset:
31
+ name: HellaSwag (10-Shot)
32
+ type: hellaswag
33
+ split: validation
34
+ args:
35
+ num_few_shot: 10
36
+ metrics:
37
+ - type: acc_norm
38
+ value: 84.87
39
+ name: normalized accuracy
40
+ source:
41
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=notbdq/alooowso
42
+ name: Open LLM Leaderboard
43
+ - task:
44
+ type: text-generation
45
+ name: Text Generation
46
+ dataset:
47
+ name: MMLU (5-Shot)
48
+ type: cais/mmlu
49
+ config: all
50
+ split: test
51
+ args:
52
+ num_few_shot: 5
53
+ metrics:
54
+ - type: acc
55
+ value: 60.78
56
+ name: accuracy
57
+ source:
58
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=notbdq/alooowso
59
+ name: Open LLM Leaderboard
60
+ - task:
61
+ type: text-generation
62
+ name: Text Generation
63
+ dataset:
64
+ name: TruthfulQA (0-shot)
65
+ type: truthful_qa
66
+ config: multiple_choice
67
+ split: validation
68
+ args:
69
+ num_few_shot: 0
70
+ metrics:
71
+ - type: mc2
72
+ value: 68.18
73
+ source:
74
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=notbdq/alooowso
75
+ name: Open LLM Leaderboard
76
+ - task:
77
+ type: text-generation
78
+ name: Text Generation
79
+ dataset:
80
+ name: Winogrande (5-shot)
81
+ type: winogrande
82
+ config: winogrande_xl
83
+ split: validation
84
+ args:
85
+ num_few_shot: 5
86
+ metrics:
87
+ - type: acc
88
+ value: 77.43
89
+ name: accuracy
90
+ source:
91
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=notbdq/alooowso
92
+ name: Open LLM Leaderboard
93
+ - task:
94
+ type: text-generation
95
+ name: Text Generation
96
+ dataset:
97
+ name: GSM8k (5-shot)
98
+ type: gsm8k
99
+ config: main
100
+ split: test
101
+ args:
102
+ num_few_shot: 5
103
+ metrics:
104
+ - type: acc
105
+ value: 39.58
106
+ name: accuracy
107
+ source:
108
+ url: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard?query=notbdq/alooowso
109
+ name: Open LLM Leaderboard
110
  ---
111
  Fine tuned model of mistral-7b-instruct-v0-2. the dataset used for fine tuning is small and custom dataset for question answering in turkish language made by me. the main duty was to make model more adapted to turkish language.
112
  prompt format :
 
135
  class Snake:
136
  def __init__(self):
137
  ```
138
+ [i cutted output for brevity]
139
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
140
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_notbdq__alooowso)
141
+
142
+ | Metric |Value|
143
+ |---------------------------------|----:|
144
+ |Avg. |65.63|
145
+ |AI2 Reasoning Challenge (25-Shot)|62.97|
146
+ |HellaSwag (10-Shot) |84.87|
147
+ |MMLU (5-Shot) |60.78|
148
+ |TruthfulQA (0-shot) |68.18|
149
+ |Winogrande (5-shot) |77.43|
150
+ |GSM8k (5-shot) |39.58|
151
+