Adding Evaluation Results

#8
Files changed (1) hide show
  1. README.md +14 -0
README.md CHANGED
@@ -45,3 +45,17 @@ python train.py configs/llama2_7b_chat_uncensored.yaml
45
 
46
  # Fine-tuning guide
47
  https://georgesung.github.io/ai/qlora-ift/
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
45
 
46
  # Fine-tuning guide
47
  https://georgesung.github.io/ai/qlora-ift/
48
+
49
+ # [Open LLM Leaderboard Evaluation Results](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard)
50
+ Detailed results can be found [here](https://huggingface.co/datasets/open-llm-leaderboard/details_georgesung__llama2_7b_chat_uncensored)
51
+
52
+ | Metric | Value |
53
+ |-----------------------|---------------------------|
54
+ | Avg. | 43.39 |
55
+ | ARC (25-shot) | 53.58 |
56
+ | HellaSwag (10-shot) | 78.66 |
57
+ | MMLU (5-shot) | 44.49 |
58
+ | TruthfulQA (0-shot) | 41.34 |
59
+ | Winogrande (5-shot) | 74.11 |
60
+ | GSM8K (5-shot) | 5.84 |
61
+ | DROP (3-shot) | 5.69 |