Edit model card

model_hh_usp3_200

This model is a fine-tuned version of meta-llama/Llama-2-7b-chat-hf on an unknown dataset. It achieves the following results on the evaluation set:

  • Loss: 1.8202
  • Rewards/chosen: -6.0974
  • Rewards/rejected: -8.9863
  • Rewards/accuracies: 0.6300
  • Rewards/margins: 2.8889
  • Logps/rejected: -120.5816
  • Logps/chosen: -117.1597
  • Logits/rejected: -0.0796
  • Logits/chosen: -0.0123

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.0005
  • train_batch_size: 4
  • eval_batch_size: 1
  • seed: 42
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 16
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_steps: 100
  • training_steps: 1000

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
0.0 8.0 100 1.8186 -5.9435 -8.8369 0.6300 2.8934 -120.4156 -116.9887 -0.0816 -0.0143
0.0 16.0 200 1.8025 -6.0024 -8.9192 0.6200 2.9168 -120.5071 -117.0542 -0.0812 -0.0142
0.0 24.0 300 1.7997 -6.0563 -8.9882 0.6300 2.9319 -120.5837 -117.1141 -0.0806 -0.0133
0.0 32.0 400 1.8278 -6.0899 -8.9796 0.6300 2.8898 -120.5742 -117.1513 -0.0794 -0.0122
0.0 40.0 500 1.8304 -6.1315 -9.0197 0.6300 2.8882 -120.6187 -117.1977 -0.0796 -0.0128
0.0 48.0 600 1.8260 -6.0887 -8.9905 0.6300 2.9018 -120.5863 -117.1501 -0.0804 -0.0131
0.0 56.0 700 1.8303 -6.1106 -8.9877 0.6300 2.8771 -120.5832 -117.1744 -0.0801 -0.0132
0.0 64.0 800 1.8139 -6.0927 -8.9885 0.6300 2.8957 -120.5840 -117.1545 -0.0786 -0.0117
0.0 72.0 900 1.8263 -6.1113 -8.9822 0.6200 2.8709 -120.5770 -117.1752 -0.0789 -0.0120
0.0 80.0 1000 1.8202 -6.0974 -8.9863 0.6300 2.8889 -120.5816 -117.1597 -0.0796 -0.0123

Framework versions

  • PEFT 0.10.0
  • Transformers 4.39.1
  • Pytorch 2.2.1+cu121
  • Datasets 2.18.0
  • Tokenizers 0.15.2
Downloads last month
4
Inference API
Unable to determine this model’s pipeline type. Check the docs .

Model tree for guoyu-zhang/model_hh_usp3_200

Adapter
(1032)
this model