zephyr-7b-dpo-full
This model was trained from scratch on the None dataset. It achieves the following results on the evaluation set:
- Loss: 0.6630
- Rewards/chosen: -0.7285
- Rewards/rejected: -0.8540
- Rewards/gen: -1.5242
- Rewards/accuracies: 0.5580
- Rewards/margins: 0.1255
- Logps/rejected: -248.7312
- Logps/chosen: -293.7192
- Logps/response: -180.2232
- Logits/rejected: -0.1364
- Logits/chosen: -0.1467
- Logits/response: -0.1655
- Improvement: 0.4559
- Penalty: 0.2162
- Weight: 0.4710
Model description
More information needed
Intended uses & limitations
More information needed
Training and evaluation data
More information needed
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- learning_rate: 5e-07
- train_batch_size: 1
- eval_batch_size: 2
- seed: 42
- distributed_type: multi-GPU
- num_devices: 4
- gradient_accumulation_steps: 16
- total_train_batch_size: 64
- total_eval_batch_size: 8
- optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
- lr_scheduler_type: cosine
- lr_scheduler_warmup_ratio: 0.1
- num_epochs: 1
Training results
Training Loss | Epoch | Step | Validation Loss | Rewards/chosen | Rewards/rejected | Rewards/gen | Rewards/accuracies | Rewards/margins | Logps/rejected | Logps/chosen | Logps/response | Logits/rejected | Logits/chosen | Logits/response | Improvement | Penalty | Weight |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0.7082 | 0.3140 | 100 | 0.7601 | -0.4820 | -0.4739 | -1.0503 | 0.4980 | -0.0081 | -244.9303 | -291.2541 | -175.4841 | -0.1680 | -0.1839 | -0.2030 | 0.4989 | 0.2710 | 0.2694 |
0.611 | 0.6281 | 200 | 0.6897 | -0.6051 | -0.6907 | -1.3230 | 0.5360 | 0.0856 | -247.0981 | -292.4845 | -178.2118 | -0.1450 | -0.1573 | -0.1758 | 0.4651 | 0.2284 | 0.3626 |
0.5986 | 0.9421 | 300 | 0.6630 | -0.7285 | -0.8540 | -1.5242 | 0.5580 | 0.1255 | -248.7312 | -293.7192 | -180.2232 | -0.1364 | -0.1467 | -0.1655 | 0.4559 | 0.2162 | 0.4710 |
Framework versions
- Transformers 4.45.2
- Pytorch 2.4.0+cu121
- Datasets 2.14.6
- Tokenizers 0.20.1
- Downloads last month
- 32
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.