thomwolf HF staff commited on
Commit
24fd7d1
β€’
1 Parent(s): d16cee2

Add details on the datasets for reproducibility

Browse files
Files changed (1) hide show
  1. src/assets/text_content.py +13 -6
src/assets/text_content.py CHANGED
@@ -77,10 +77,9 @@ With the plethora of large language models (LLMs) and chatbots being released we
77
 
78
  We chose these benchmarks as they test a variety of reasoning and general knowledge across a wide variety of fields in 0-shot and few-shot settings.
79
 
80
-
81
  # Some good practices before submitting a model
82
 
83
- ## 1) Make sure you can load your model and tokenizer using AutoClasses:
84
  ```python
85
  from transformers import AutoConfig, AutoModel, AutoTokenizer
86
  config = AutoConfig.from_pretrained("your model name", revision=revision)
@@ -92,16 +91,24 @@ If this step fails, follow the error messages to debug your model before submitt
92
  Note: make sure your model is public!
93
  Note: if your model needs `use_remote_code=True`, we do not support this option yet but we are working on adding it, stay posted!
94
 
95
- ## 2) Convert your model weights to [safetensors](https://huggingface.co/docs/safetensors/index)
96
  It's a new format for storing weights which is safer and faster to load and use. It will also allow us to add the number of weights of your model to the `Extended Viewer`!
97
 
98
- ## 3) Make sure your model has an open license!
99
  This is a leaderboard for Open LLMs, and we'd love for as many people as possible to know they can use your model πŸ€—
100
 
101
- ## 4) Fill up your model card
102
  When we add extra information about models to the leaderboard, it will be automatically taken from the model card
103
 
104
- # Reproduction
 
 
 
 
 
 
 
 
105
  To reproduce our results, here is the commands you can run, using [this version](https://github.com/EleutherAI/lm-evaluation-harness/tree/e47e01beea79cfe87421e2dac49e64d499c240b4) of the Eleuther AI Harness:
106
  `python main.py --model=hf-causal --model_args="pretrained=<your_model>,use_accelerate=True,revision=<your_model_revision>"`
107
  ` --tasks=<task_list> --num_fewshot=<n_few_shot> --batch_size=2 --output_path=<output_path>`
 
77
 
78
  We chose these benchmarks as they test a variety of reasoning and general knowledge across a wide variety of fields in 0-shot and few-shot settings.
79
 
 
80
  # Some good practices before submitting a model
81
 
82
+ ### 1) Make sure you can load your model and tokenizer using AutoClasses:
83
  ```python
84
  from transformers import AutoConfig, AutoModel, AutoTokenizer
85
  config = AutoConfig.from_pretrained("your model name", revision=revision)
 
91
  Note: make sure your model is public!
92
  Note: if your model needs `use_remote_code=True`, we do not support this option yet but we are working on adding it, stay posted!
93
 
94
+ ### 2) Convert your model weights to [safetensors](https://huggingface.co/docs/safetensors/index)
95
  It's a new format for storing weights which is safer and faster to load and use. It will also allow us to add the number of weights of your model to the `Extended Viewer`!
96
 
97
+ ### 3) Make sure your model has an open license!
98
  This is a leaderboard for Open LLMs, and we'd love for as many people as possible to know they can use your model πŸ€—
99
 
100
+ ### 4) Fill up your model card
101
  When we add extra information about models to the leaderboard, it will be automatically taken from the model card
102
 
103
+ # Reproducibility and details
104
+
105
+ ### Details and logs
106
+ You can find:
107
+ - detailed numerical results in the `results` Hugging Face dataset: https://huggingface.co/datasets/open-llm-leaderboard/results
108
+ - details on the input/outputs for the models in the `details` Hugging Face dataset: https://huggingface.co/datasets/open-llm-leaderboard/details
109
+ - community queries and running status in the `requests` Hugging Face dataset: https://huggingface.co/datasets/open-llm-leaderboard/requests
110
+
111
+ ### Reproducibility
112
  To reproduce our results, here is the commands you can run, using [this version](https://github.com/EleutherAI/lm-evaluation-harness/tree/e47e01beea79cfe87421e2dac49e64d499c240b4) of the Eleuther AI Harness:
113
  `python main.py --model=hf-causal --model_args="pretrained=<your_model>,use_accelerate=True,revision=<your_model_revision>"`
114
  ` --tasks=<task_list> --num_fewshot=<n_few_shot> --batch_size=2 --output_path=<output_path>`