OpenELM-1_1B-DPO-full-most-similar

This model was trained from scratch on an unknown dataset. It achieves the following results on the evaluation set:

Loss: 1.2575
Rewards/chosen: -6.8438
Rewards/rejected: -7.25
Rewards/accuracies: 0.5215
Rewards/margins: 0.3887
Logps/rejected: -1012.0
Logps/chosen: -1004.0
Logits/rejected: -5.0
Logits/chosen: -6.125

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 16
seed: 42
distributed_type: multi-GPU
num_devices: 4
gradient_accumulation_steps: 2
total_train_batch_size: 64
total_eval_batch_size: 64
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 3

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
0.6224	0.1047	100	0.6811	-0.4375	-0.4980	0.5566	0.0618	-338.0	-362.0	-12.25	-12.5
0.624	0.2093	200	0.6957	-0.8281	-0.8867	0.5176	0.0569	-378.0	-402.0	-11.0	-11.4375
0.6062	0.3140	300	0.7007	-0.5938	-0.6289	0.5078	0.0337	-352.0	-378.0	-12.25	-12.4375
0.6334	0.4186	400	0.7011	-1.2656	-1.3438	0.5176	0.0815	-424.0	-444.0	-11.375	-11.75
0.6236	0.5233	500	0.7273	-1.1172	-1.1875	0.5527	0.0659	-408.0	-430.0	-11.4375	-11.8125
0.648	0.6279	600	0.6997	-1.3438	-1.3984	0.5059	0.0508	-428.0	-452.0	-13.75	-13.625
0.6131	0.7326	700	0.7108	-1.4922	-1.5312	0.5293	0.0396	-442.0	-468.0	-12.8125	-12.625
0.621	0.8373	800	0.7204	-1.3516	-1.4141	0.5371	0.0581	-430.0	-454.0	-14.0625	-14.0625
0.6114	0.9419	900	0.7060	-1.6797	-1.8125	0.5371	0.1328	-470.0	-486.0	-13.875	-13.8125
0.1659	1.0466	1000	0.8400	-2.9688	-3.2188	0.5645	0.2520	-608.0	-616.0	-7.5	-8.5625
0.1767	1.1512	1100	0.9194	-3.0781	-3.2188	0.5156	0.1406	-612.0	-624.0	-12.125	-13.0
0.1574	1.2559	1200	0.9110	-3.8125	-4.0938	0.5332	0.2715	-696.0	-700.0	-11.9375	-12.75
0.1637	1.3605	1300	0.8868	-3.5312	-3.7656	0.5410	0.2314	-664.0	-672.0	-11.25	-12.0
0.1275	1.4652	1400	0.9276	-3.7031	-3.9844	0.5488	0.2754	-688.0	-688.0	-9.0625	-10.1875
0.1468	1.5699	1500	0.9168	-3.9688	-4.1562	0.5352	0.1943	-704.0	-716.0	-10.6875	-11.3125
0.1427	1.6745	1600	0.9187	-4.3125	-4.5625	0.5234	0.2656	-744.0	-748.0	-10.125	-11.0
0.1592	1.7792	1700	0.8701	-4.6875	-5.0312	0.5586	0.3516	-792.0	-784.0	-10.4375	-11.25
0.1341	1.8838	1800	0.9226	-3.9531	-4.2188	0.5391	0.2598	-708.0	-712.0	-9.625	-10.5625
0.1366	1.9885	1900	0.9103	-4.1562	-4.4375	0.5234	0.2754	-732.0	-736.0	-9.9375	-10.75
0.026	2.0931	2000	1.0973	-5.7812	-6.125	0.5254	0.3379	-900.0	-896.0	-6.5	-7.5312
0.0178	2.1978	2100	1.1703	-6.0312	-6.4375	0.5293	0.3867	-932.0	-924.0	-6.2188	-7.2812
0.019	2.3025	2200	1.1800	-6.4062	-6.8125	0.5312	0.4004	-968.0	-960.0	-5.9062	-6.9688
0.0173	2.4071	2300	1.1893	-6.3438	-6.75	0.5293	0.3965	-964.0	-952.0	-5.7188	-6.7812
0.0147	2.5118	2400	1.2635	-6.7188	-7.125	0.5176	0.3926	-1000.0	-992.0	-5.375	-6.4688
0.016	2.6164	2500	1.2629	-6.75	-7.125	0.5195	0.375	-1000.0	-992.0	-5.3125	-6.4062
0.0171	2.7211	2600	1.2716	-6.8438	-7.2188	0.5176	0.3809	-1012.0	-1004.0	-5.125	-6.2188
0.0123	2.8257	2700	1.2615	-6.875	-7.25	0.5195	0.3867	-1016.0	-1008.0	-5.0	-6.0938
0.0198	2.9304	2800	1.2575	-6.8438	-7.25	0.5215	0.3887	-1012.0	-1004.0	-5.0	-6.125

Framework versions

Transformers 4.45.1
Pytorch 2.3.0
Datasets 3.0.1
Tokenizers 0.20.0

CharlesLi
/

OpenELM-1_1B-DPO-full-most-similar

OpenELM-1_1B-DPO-full-most-similar

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Evaluation results