Quentin Gallouédec

qgallouedec

https://gallouedec.com

AI & ML interests

None yet

Articles

Preference Optimization for Vision Language Models

Jul 10

• 40

Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent

Apr 22

• 78

Organizations

qgallouedec's activity

upvoted a paper 18 days ago

The Perfect Blend: Redefining RLHF with Mixture of Judges

Paper • 2409.20370 • Published 22 days ago • 4

upvoted a paper 22 days ago

Contrastive Preference Optimization: Pushing the Boundaries of LLM Performance in Machine Translation

Paper • 2401.08417 • Published Jan 16 • 31

upvoted a collection 22 days ago

PaliGemma Release

Collection

Pretrained and mix checkpoints for PaliGemma • 16 items • Updated Jul 31 • 137

upvoted 3 papers 26 days ago

Exploratory Preference Optimization: Harnessing Implicit Q*-Approximation for Sample-Efficient RLHF

Paper • 2405.21046 • Published May 31 • 3

ORPO: Monolithic Preference Optimization without Reference Model

Paper • 2403.07691 • Published Mar 12 • 61

Binary Classifier Optimization for Large Language Model Alignment

Paper • 2404.04656 • Published Apr 6 • 2

upvoted a paper about 2 months ago

Building and better understanding vision-language models: insights and future directions

Paper • 2408.12637 • Published Aug 22 • 115

upvoted a paper 2 months ago

Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs

Paper • 2402.14740 • Published Feb 22 • 8

upvoted an article 2 months ago

Article

The 5 Most Under-Rated Tools on Hugging Face

Aug 22

• 84

upvoted 2 papers 2 months ago

Helping or Herding? Reward Model Ensembles Mitigate but do not Eliminate Reward Hacking

Paper • 2312.09244 • Published Dec 14, 2023 • 5

Anchored Preference Optimization and Contrastive Revisions: Addressing Underspecification in Alignment

Paper • 2408.06266 • Published Aug 12 • 9

upvoted 2 papers 3 months ago

A Rank Stabilization Scaling Factor for Fine-Tuning with LoRA

Paper • 2312.03732 • Published Nov 28, 2023 • 7

The Curious Case of Neural Text Degeneration

Paper • 1904.09751 • Published Apr 22, 2019 • 3

upvoted an article 3 months ago

Article

Putting RL back in RLHF

Jun 12

• 62

upvoted a paper 3 months ago

Understanding Reference Policies in Direct Preference Optimization

Paper • 2407.13709 • Published Jul 18 • 16

upvoted 3 articles 3 months ago

Article

Docmatix - a huge dataset for Document Visual Question Answering

Jul 18

• 66

Article

How NuminaMath Won the 1st AIMO Progress Prize

Jul 11

• 95

Article

Preference Optimization for Vision Language Models

Jul 10

• 40

upvoted 4 papers 4 months ago

Cleanba: A Reproducible and Efficient Distributed Reinforcement Learning Platform

Paper • 2310.00036 • Published Sep 29, 2023 • 2

CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms

Paper • 2111.08819 • Published Nov 16, 2021 • 2

Improve Mathematical Reasoning in Language Models by Automated Process Supervision

Paper • 2406.06592 • Published Jun 5 • 24

Accessing GPT-4 level Mathematical Olympiad Solutions via Monte Carlo Tree Self-refine with LLaMa-3 8B

Paper • 2406.07394 • Published Jun 11 • 21

upvoted a paper 5 months ago

LIMA: Less Is More for Alignment

Paper • 2305.11206 • Published May 18, 2023 • 21

upvoted an article 5 months ago

Article

2024-04-22 - Hub Incident Post Mortem

•

May 17

• 17

upvoted a collection 5 months ago

SimPO

Collection

This collections contains a list of SimPO and baseline models. • 49 items • Updated Sep 5 • 13

upvoted an article 5 months ago

Article

Introducing IDEFICS: An Open Reproduction of State-of-the-art Visual Language Model

Aug 22, 2023

• 26

upvoted a collection 5 months ago

Idefics2 🐶

Collection

Idefics2-8B is a foundation vision-language model. In this collection, you will find the models, datasets and demo related to its creation. • 11 items • Updated May 6 • 88

upvoted a paper 6 months ago

Iterative Reasoning Preference Optimization

Paper • 2404.19733 • Published Apr 30 • 46

upvoted a collection 6 months ago

Preference Datasets for DPO

Collection

This collection contains a list of curated preference datasets for DPO fine-tuning for intent alignment of LLMs • 7 items • Updated Jul 30 • 29

upvoted 3 articles 6 months ago

Article

Don't repeat yourself - 🤗 Transformers Design Philosophy

Apr 5, 2022

• 11

Article

Public Policy at Hugging Face

Apr 8

• 19

Article

Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent

Apr 22

• 78

upvoted 2 papers 6 months ago

Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent

Paper • 2402.09844 • Published Feb 15 • 20

A Generalist Agent

Paper • 2205.06175 • Published May 12, 2022 • 3

upvoted 2 articles 6 months ago

Article

Welcome Llama 3 - Meta's new open LLM

Apr 18

• 275

Article

Introducing the LiveCodeBench Leaderboard - Holistic and Contamination-Free Evaluation of Code LLMs

Apr 16

• 14