Haoxiang Wang's picture

Haoxiang Wang

Haoxiang-Wang

·

https://haoxiang-wang.github.io/

AI & ML interests

Machine Learning (Transfer Learning, OOD Generalization, Domain Adaptation, Meta-Learning)

Organizations

Haoxiang-Wang's activity

New activity in sfairXC/FsfairX-LLaMA3-RM-v0.1 16 days ago

Update README.md

#6 opened 16 days ago by

New activity in RLHFlow/ArmoRM-Llama3-8B-v0.1 about 2 months ago

Why is the code-complexity coefficient so high in the demo example?

#16 opened about 2 months ago by

New activity in RLHFlow/ArmoRM-Llama3-8B-v0.1 3 months ago

Special tokens in the vocabulary?

#13 opened 3 months ago by

Original reward space

#15 opened 3 months ago by

[AUTOMATED] Model Memory Requirements

#5 opened 5 months ago by

model-sizer-bot

What is the range of the output score from the model?

#12 opened 3 months ago by

Why is `multi_obj_rewards` multipled by 5, but then 0.5 is subtracted from it?

#11 opened 3 months ago by

Update README.md

#3 opened 5 months ago by

Issue when finetuning the reward model on custom dataset

#2 opened 5 months ago by

Longer context

#10 opened 3 months ago by

New activity in RLHFlow/ArmoRM-Llama3-8B-v0.1 4 months ago

batched predictions with padding through the model don't seem to work correctly

#7 opened 4 months ago by

ModuleNotFoundError: No module named 'transformers_modules.RLHFlow.ArmoRM-Llama3-8B-v0'

#6 opened 4 months ago by

Why Not Utilize a Sigmoid Function in the Regression Layer?

#8 opened 4 months ago by

New activity in allenai/reward-bench 5 months ago

Separate Scores: With & Without Prior Sets

#6 opened 5 months ago by

New activity in RLHFlow/ArmoRM-Llama3-8B-v0.1 5 months ago

Problem running the model

#1 opened 5 months ago by

New activity in RLHFlow/LLaMA3-iterative-DPO-final 5 months ago

exl2 quants

#2 opened 5 months ago by

New activity in RLHFlow/pair-preference-model-LLaMA3-8B 5 months ago

CAn you specify the license for this model please ?

#1 opened 5 months ago by

commented a paper 6 months ago

RLHF Workflow: From Reward Modeling to Online RLHF

Paper • 2405.07863 • Published May 13 • 67 •

New activity in prometheus-eval/Feedback-Bench 7 months ago

Data Description

#2 opened 7 months ago by

New activity in prometheus-eval/Preference-Bench 7 months ago

Data Description

#2 opened 7 months ago by