The Perfect Blend: Redefining RLHF with Mixture of Judges Paper • 2409.20370 • Published 22 days ago • 4
Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation Paper • 2401.08417 • Published Jan 16 • 31
PaliGemma Release Collection Pretrained and mix checkpoints for PaliGemma • 16 items • Updated Jul 31 • 137
Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF Paper • 2405.21046 • Published May 31 • 3
ORPO: Monolithic Preference Optimization without Reference Model Paper • 2403.07691 • Published Mar 12 • 61
Binary Classifier Optimization for Large Language Model Alignment Paper • 2404.04656 • Published Apr 6 • 2
Building and better understanding vision-language models: insights and future directions Paper • 2408.12637 • Published Aug 22 • 115
Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs Paper • 2402.14740 • Published Feb 22 • 8
Helping or Herding? Reward Model Ensembles Mitigate but do not Eliminate Reward Hacking Paper • 2312.09244 • Published Dec 14, 2023 • 5
Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment Paper • 2408.06266 • Published Aug 12 • 9
A Rank Stabilization Scaling Factor for Fine-Tuning with LoRA Paper • 2312.03732 • Published Nov 28, 2023 • 7
Understanding Reference Policies in Direct Preference Optimization Paper • 2407.13709 • Published Jul 18 • 16
Cleanba: A Reproducible and Efficient Distributed Reinforcement Learning Platform Paper • 2310.00036 • Published Sep 29, 2023 • 2
CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms Paper • 2111.08819 • Published Nov 16, 2021 • 2
Improve Mathematical Reasoning in Language Models by Automated Process Supervision Paper • 2406.06592 • Published Jun 5 • 24
Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B Paper • 2406.07394 • Published Jun 11 • 21
SimPO Collection This collections contains a list of SimPO and baseline models. • 49 items • Updated Sep 5 • 13
view article Article Introducing IDEFICS: An Open Reproduction of State-of-the-art Visual Language Model Aug 22, 2023 • 26
Idefics2 🐶 Collection Idefics2-8B is a foundation vision-language model. In this collection, you will find the models, datasets and demo related to its creation. • 11 items • Updated May 6 • 88
Preference Datasets for DPO Collection This collection contains a list of curated preference datasets for DPO fine-tuning for intent alignment of LLMs • 7 items • Updated Jul 30 • 29
view article Article Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent Apr 22 • 78
Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent Paper • 2402.09844 • Published Feb 15 • 20
view article Article Introducing the LiveCodeBench Leaderboard - Holistic and Contamination-Free Evaluation of Code LLMs Apr 16 • 14