Adding Evaluation Results

This is an automated PR created with https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr

The purpose of this PR is to add evaluation results from the Open LLM Leaderboard to your model card.

If you encounter any issues, please report them to https://huggingface.co/spaces/Weyaxi/open-llm-leaderboard-results-pr/discussions

Files changed (1) hide show

README.md +114 -5

README.md CHANGED Viewed

@@ -1,14 +1,109 @@
 ---
-base_model:
-- maldv/badger-lambda-llama-3-8b
-- maldv/llama-3-fantasy-writer-8b
-- dreamgen-preview/opus-v1.2-llama-3-8b-instruct-run3.5-epoch2.5
 library_name: transformers
 tags:
 - fourier
 - task addition
 - merge
-license: cc-by-nc-4.0
 ---
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65b19c1b098c85365af5a83e/SpNOAI3VKUsWegChuQHTk.png)
@@ -61,3 +156,17 @@ Purpose: Exposition ; Descriptive ; Visual Detail, character appearance
 ````
 [Terminal Connection](https://www.maldevide.com/books/terminal_connection.txt)

 ---
+license: cc-by-nc-4.0
 library_name: transformers
 tags:
 - fourier
 - task addition
 - merge
+base_model:
+- maldv/badger-lambda-llama-3-8b
+- maldv/llama-3-fantasy-writer-8b
+- dreamgen-preview/opus-v1.2-llama-3-8b-instruct-run3.5-epoch2.5
+model-index:
+- name: badger-writer-llama-3-8b
+ results:
+ - task:
+ type: text-generation
+ name: Text Generation
+ dataset:
+ name: IFEval (0-Shot)
+ type: HuggingFaceH4/ifeval
+ args:
+ num_few_shot: 0
+ metrics:
+ - type: inst_level_strict_acc and prompt_level_strict_acc
+ value: 53.03
+ name: strict accuracy
+ source:
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=maldv/badger-writer-llama-3-8b
+ name: Open LLM Leaderboard
+ - task:
+ type: text-generation
+ name: Text Generation
+ dataset:
+ name: BBH (3-Shot)
+ type: BBH
+ args:
+ num_few_shot: 3
+ metrics:
+ - type: acc_norm
+ value: 26.88
+ name: normalized accuracy
+ source:
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=maldv/badger-writer-llama-3-8b
+ name: Open LLM Leaderboard
+ - task:
+ type: text-generation
+ name: Text Generation
+ dataset:
+ name: MATH Lvl 5 (4-Shot)
+ type: hendrycks/competition_math
+ args:
+ num_few_shot: 4
+ metrics:
+ - type: exact_match
+ value: 6.57
+ name: exact match
+ source:
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=maldv/badger-writer-llama-3-8b
+ name: Open LLM Leaderboard
+ - task:
+ type: text-generation
+ name: Text Generation
+ dataset:
+ name: GPQA (0-shot)
+ type: Idavidrein/gpqa
+ args:
+ num_few_shot: 0
+ metrics:
+ - type: acc_norm
+ value: 5.26
+ name: acc_norm
+ source:
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=maldv/badger-writer-llama-3-8b
+ name: Open LLM Leaderboard
+ - task:
+ type: text-generation
+ name: Text Generation
+ dataset:
+ name: MuSR (0-shot)
+ type: TAUR-Lab/MuSR
+ args:
+ num_few_shot: 0
+ metrics:
+ - type: acc_norm
+ value: 3.2
+ name: acc_norm
+ source:
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=maldv/badger-writer-llama-3-8b
+ name: Open LLM Leaderboard
+ - task:
+ type: text-generation
+ name: Text Generation
+ dataset:
+ name: MMLU-PRO (5-shot)
+ type: TIGER-Lab/MMLU-Pro
+ config: main
+ split: test
+ args:
+ num_few_shot: 5
+ metrics:
+ - type: acc
+ value: 30.67
+ name: accuracy
+ source:
+ url: https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard?query=maldv/badger-writer-llama-3-8b
+ name: Open LLM Leaderboard
 ---
 ![image/png](https://cdn-uploads.huggingface.co/production/uploads/65b19c1b098c85365af5a83e/SpNOAI3VKUsWegChuQHTk.png)
 ````
 [Terminal Connection](https://www.maldevide.com/books/terminal_connection.txt)
+# [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/open-llm-leaderboard/open_llm_leaderboard)
+Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_maldv__badger-writer-llama-3-8b)
+| Metric |Value|
+|-------------------|----:|
+|Avg. |20.93|
+|IFEval (0-Shot) |53.03|
+|BBH (3-Shot) |26.88|
+|MATH Lvl 5 (4-Shot)| 6.57|
+|GPQA (0-shot) | 5.26|
+|MuSR (0-shot) | 3.20|
+|MMLU-PRO (5-shot) |30.67|