Model Card for Model ID

Model Details

Model Description

This model is DPO by argilla/dpo-mix-7k dataset on rungao2001/vicuna-7b-v1.5_deita10k_sft_full model.

Model type: Llama2 Decoder-Only
Language(s) (NLP): English
License: llama2
Finetuned from model: rungao2001/vicuna-7b-v1.5_deita10k_sft_full

Training Details

Training Data

argilla/dpo-mix-7k

Training Procedure

DPO

Notice: The chat_template was modified because the original vicuna1.1 format cannot be used in trl.DPOTrainer. The error "Conversation roles must alternate user/assistant/user/assistant/..." was removed, and the system message is output only when loop.index0 == 0 and role == 'user'.

Training Hyperparameters

Precision: BFloat16
Chat Template: Modified Vicuna 1.1
Global Batch Size: 128
Learning Rate: 1.0e-6
Num Epoches: 3
Max Prompt Length: 1800
Max Length: 2048
Training Steps 156

Evaluation

It Finally achieved loss=0.5006, and rewards/accuracies = 78.72% in the eval set of argilla/dpo-mix-7k

rungao2001
/

vicuna-7b-v1.5-dpo-mix-7k-full