Haoxiang-Wang
commited on
Commit
•
e27e178
1
Parent(s):
c128eb7
Update README.md
Browse files
README.md
CHANGED
@@ -1,5 +1,5 @@
|
|
1 |
---
|
2 |
-
|
3 |
---
|
4 |
This preference model is trained from [LLaMA3-8B-it](meta-llama/Meta-Llama-3-8B-Instruct) with the training script at [Reward Modeling](https://github.com/RLHFlow/RLHF-Reward-Modeling/tree/pm_dev/pair-pm).
|
5 |
|
|
|
1 |
---
|
2 |
+
license: llama3
|
3 |
---
|
4 |
This preference model is trained from [LLaMA3-8B-it](meta-llama/Meta-Llama-3-8B-Instruct) with the training script at [Reward Modeling](https://github.com/RLHFlow/RLHF-Reward-Modeling/tree/pm_dev/pair-pm).
|
5 |
|