Model Card for Model ID
Model Details
Model Description
This model is DPO by argilla/dpo-mix-7k
dataset on rungao2001/vicuna-7b-v1.5_deita10k_sft_full
model.
- Model type: Llama2 Decoder-Only
- Language(s) (NLP): English
- License: llama2
- Finetuned from model: rungao2001/vicuna-7b-v1.5_deita10k_sft_full
Training Details
Training Data
argilla/dpo-mix-7k
Training Procedure
DPO
Notice: The chat_template was modified because the original vicuna1.1 format cannot be used in trl.DPOTrainer. The error "Conversation roles must alternate user/assistant/user/assistant/..." was removed, and the system message is output only when loop.index0 == 0 and role == 'user'.
Training Hyperparameters
- Precision: BFloat16
- Chat Template: Modified Vicuna 1.1
- Global Batch Size: 128
- Learning Rate: 1.0e-6
- Num Epoches: 3
- Max Prompt Length: 1800
- Max Length: 2048
- Training Steps 156
Evaluation
It Finally achieved loss=0.5006, and rewards/accuracies
= 78.72% in the eval set of argilla/dpo-mix-7k
Testing Data, Factors & Metrics
- Downloads last month
- 9
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.