radm's picture
Update README.md
6cdf428 verified
---
library_name: peft
base_model: NousResearch/Meta-Llama-3-70B-Instruct
license: apache-2.0
---
# Model Card for radm/Llama-3-70B-Instruct-AH-AWQ
<!-- Provide a quick summary of what the model is/does. -->
This model fine-tuned to be a judge on Arena Hard (https://github.com/lm-sys/arena-hard-auto).
The base model was trained using LoRA, combined with an adapter and converted to AWQ format.
Only LoRA adapter for base model can be found here (https://huggingface.co/radm/Llama-3-70B-Instruct-AH-lora)
## Model Details
### Model Description
<!-- Provide a longer summary of what this model is. -->
- **Developed by:** [radm]
- **Model type:** [Llama-3-70b]
- **Language(s) (NLP):** [English]
- **License:** [apache-2.0]
- **Finetuned from model [optional]:** [NousResearch/Meta-Llama-3-70B-Instruct]
## Uses
<!-- Address questions around how the model is intended to be used, including the foreseeable users of the model and those affected by the model. -->
Use repository (https://github.com/r4dm/arena-hard-local) for evaluate with local judge model.
### Results
#### Llama-3-70B-Instruct-GPTQ as judge:
```console
Llama-3-Instruct-8B-SimPO | score: 78.3 | 95% CI: (-1.5, 1.2) | average #tokens: 545
SELM-Llama-3-8B-Instruct-iter-3 | score: 72.8 | 95% CI: (-2.1, 1.4) | average #tokens: 606
Meta-Llama-3-8B-Instruct-f16 | score: 65.3 | 95% CI: (-1.8, 2.1) | average #tokens: 560
suzume-llama-3-8B-multilingual-orpo-borda-half | score: 63.5 | 95% CI: (-1.6, 2.1) | average #tokens: 978
Phi-3-medium-128k-instruct | score: 50.0 | 95% CI: (0.0, 0.0) | average #tokens: 801
suzume-llama-3-8B-multilingual | score: 48.1 | 95% CI: (-2.2, 1.8) | average #tokens: 767
aya-23-8B | score: 48.0 | 95% CI: (-2.0, 2.1) | average #tokens: 834
Vikhr-7B-instruct_0.5 | score: 19.6 | 95% CI: (-1.3, 1.5) | average #tokens: 794
alpindale_gemma-2b-it | score: 11.2 | 95% CI: (-1.0, 0.8) | average #tokens: 425
```
#### Llama-3-70B-Instruct-AH-AWQ as judge:
```console
Llama-3-Instruct-8B-SimPO | score: 83.8 | 95% CI: (-1.4, 1.3) | average #tokens: 545
SELM-Llama-3-8B-Instruct-iter-3 | score: 78.8 | 95% CI: (-1.7, 1.9) | average #tokens: 606
suzume-llama-3-8B-multilingual-orpo-borda-half | score: 71.8 | 95% CI: (-1.7, 2.4) | average #tokens: 978
Meta-Llama-3-8B-Instruct-f16 | score: 69.8 | 95% CI: (-1.9, 1.7) | average #tokens: 560
suzume-llama-3-8B-multilingual | score: 54.0 | 95% CI: (-2.1, 2.1) | average #tokens: 767
aya-23-8B | score: 50.4 | 95% CI: (-1.7, 1.7) | average #tokens: 834
Phi-3-medium-128k-instruct | score: 50.0 | 95% CI: (0.0, 0.0) | average #tokens: 801
Vikhr-7B-instruct_0.5 | score: 14.2 | 95% CI: (-1.3, 1.0) | average #tokens: 794
alpindale_gemma-2b-it | score: 7.9 | 95% CI: (-0.9, 0.8) | average #tokens: 425
```
## Training Details
### Training Data
<!-- This should link to a Dataset Card, perhaps with a short stub of information on what the training data is all about as well as documentation related to data pre-processing or additional filtering. -->
Datasets:
- radm/arenahard_gpt4vsllama3
- radm/truthy-dpo-v0.1-ru
- jondurbin/truthy-dpo-v0.1
#### Training Hyperparameters
- **Training regime:** [bf16] <!--fp32, fp16 mixed precision, bf16 mixed precision, bf16 non-mixed precision, fp16 non-mixed precision, fp8 mixed precision -->
- **Load in 4 bit:** [True]
- **Target modules:** [all]
- **LoRA rank:** [16]
- **Max seq length:** [8192]
- **Use gradient checkpointing:** [unsloth]
- **trainer:** [ORPOTrainer]
- **Batch size:** [1]
- **Gradient accumulation steps:** [4]
- **Epochs:** [1]
### Hardware
- **Hardware Type:** [Nvidia A100 80 gb]
- **Hours used:** [11 hours]
### Framework versions
- PEFT 0.10.0