Haoxiang-Wang
commited on
Commit
•
a23d9a2
1
Parent(s):
e1d2459
Update README.md
Browse files
README.md
CHANGED
@@ -5,6 +5,7 @@ This preference model is trained from [LLaMA3-8B-it](meta-llama/Meta-Llama-3-8B-
|
|
5 |
|
6 |
The dataset is RLHFlow/pair_preference_model_dataset. It achieves Chat-98.6, Char-hard 65.8, Safety 89.6, and reasoning 94.9 in reward bench.
|
7 |
|
|
|
8 |
|
9 |
## Service the RM
|
10 |
|
@@ -62,4 +63,26 @@ for chosen_position in [0, 1]:
|
|
62 |
avg_prob_chosen = np.mean(probs_chosen)
|
63 |
correct = 0.5 if avg_prob_chosen == 0.5 else float(avg_prob_chosen > 0.5)
|
64 |
print(correct)
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
65 |
```
|
|
|
5 |
|
6 |
The dataset is RLHFlow/pair_preference_model_dataset. It achieves Chat-98.6, Char-hard 65.8, Safety 89.6, and reasoning 94.9 in reward bench.
|
7 |
|
8 |
+
See our paper [RLHF Workflow: From Reward Modeling to Online RLHF](https://arxiv.org/abs/2405.07863) for more details of this model.
|
9 |
|
10 |
## Service the RM
|
11 |
|
|
|
63 |
avg_prob_chosen = np.mean(probs_chosen)
|
64 |
correct = 0.5 if avg_prob_chosen == 0.5 else float(avg_prob_chosen > 0.5)
|
65 |
print(correct)
|
66 |
+
```
|
67 |
+
|
68 |
+
## Citation
|
69 |
+
If you use this model in your research, please consider citing our paper
|
70 |
+
```
|
71 |
+
@misc{rlhflow,
|
72 |
+
title={RLHF Workflow: From Reward Modeling to Online RLHF},
|
73 |
+
author={Hanze Dong and Wei Xiong and Bo Pang and Haoxiang Wang and Han Zhao and Yingbo Zhou and Nan Jiang and Doyen Sahoo and Caiming Xiong and Tong Zhang},
|
74 |
+
year={2024},
|
75 |
+
eprint={2405.07863},
|
76 |
+
archivePrefix={arXiv},
|
77 |
+
primaryClass={cs.LG}
|
78 |
+
}
|
79 |
+
```
|
80 |
+
and Google's Slic paper (which initially proposes this pairwise preference model)
|
81 |
+
```
|
82 |
+
@article{zhao2023slic,
|
83 |
+
title={Slic-hf: Sequence likelihood calibration with human feedback},
|
84 |
+
author={Zhao, Yao and Joshi, Rishabh and Liu, Tianqi and Khalman, Misha and Saleh, Mohammad and Liu, Peter J},
|
85 |
+
journal={arXiv preprint arXiv:2305.10425},
|
86 |
+
year={2023}
|
87 |
+
}
|
88 |
```
|