AutoTrain: No-code training for state-of-the-art models Paper • 2410.15735 • Published 1 day ago • 38
Falcon Mamba: The First Competitive Attention-free 7B Language Model Paper • 2410.05355 • Published 15 days ago • 26
Critique-out-Loud Reward Models Collection Paper: https://arxiv.org/abs/2408.11791 | Code: https://github.com/zankner/CLoud • 7 items • Updated Sep 5 • 3
Style over Substance: Failure Modes of LLM Judges in Alignment Benchmarking Paper • 2409.15268 • Published 29 days ago • 11
Building and better understanding vision-language models: insights and future directions Paper • 2408.12637 • Published Aug 22 • 115
view article Article Llama-3.1-Storm-8B: Improved SLM with Self-Curation + Model Merging By akjindal53244 • Aug 19 • 73
Agent Q: Advanced Reasoning and Learning for Autonomous AI Agents Paper • 2408.07199 • Published Aug 13 • 20
view article Article A failed experiment: Infini-Attention, and why we should keep trying? Aug 14 • 46
Instruction-Following Evaluation for Large Language Models Paper • 2311.07911 • Published Nov 14, 2023 • 19
🍃 MINT-1T Collection Data for "MINT-1T: Scaling Open-Source Multimodal Data by 10x: A Multimodal Dataset with One Trillion Tokens" • 13 items • Updated Jul 24 • 50
NuminaMath Collection Datasets and models for training SOTA math LLMs. See our GitHub for training & inference code: https://github.com/project-numina/aimo-progress-prize • 6 items • Updated Jul 21 • 59
Replacing Judges with Juries: Evaluating LLM Generations with a Panel of Diverse Models Paper • 2404.18796 • Published Apr 29 • 68
SpaceByte: Towards Deleting Tokenization from Large Language Modeling Paper • 2404.14408 • Published Apr 22 • 6
Jack of All Trades, Master of Some, a Multi-Purpose Transformer Agent Paper • 2402.09844 • Published Feb 15 • 20
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing Paper • 2404.12253 • Published Apr 18 • 53
Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study Paper • 2404.10719 • Published Apr 16 • 3
From r to Q^*: Your Language Model is Secretly a Q-Function Paper • 2404.12358 • Published Apr 18 • 2
ORPO: Monolithic Preference Optimization without Reference Model Paper • 2403.07691 • Published Mar 12 • 61
MM1: Methods, Analysis & Insights from Multimodal LLM Pre-training Paper • 2403.09611 • Published Mar 14 • 124
Measuring Multimodal Mathematical Reasoning with MATH-Vision Dataset Paper • 2402.14804 • Published Feb 22 • 2
Branch-Train-MiX: Mixing Expert LLMs into a Mixture-of-Experts LLM Paper • 2403.07816 • Published Mar 12 • 39
GSM-Plus: A Comprehensive Benchmark for Evaluating the Robustness of LLMs as Mathematical Problem Solvers Paper • 2402.19255 • Published Feb 29 • 1
Learning to Generate Instruction Tuning Datasets for Zero-Shot Task Adaptation Paper • 2402.18334 • Published Feb 28 • 12
Functional Benchmarks for Robust Evaluation of Reasoning Performance, and the Reasoning Gap Paper • 2402.19450 • Published Feb 29 • 3
Back to Basics: Revisiting REINFORCE Style Optimization for Learning from Human Feedback in LLMs Paper • 2402.14740 • Published Feb 22 • 8
The Unlocking Spell on Base LLMs: Rethinking Alignment via In-Context Learning Paper • 2312.01552 • Published Dec 4, 2023 • 30
ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization Paper • 2402.09320 • Published Feb 14 • 6
Long Is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning Paper • 2402.04833 • Published Feb 7 • 6
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset Paper • 2402.10176 • Published Feb 15 • 34
A Minimaximalist Approach to Reinforcement Learning from Human Feedback Paper • 2401.04056 • Published Jan 8 • 2
Possible Meissner effect near room temperature in copper-substituted lead apatite Paper • 2401.00999 • Published Jan 2 • 5
R-Tuning: Teaching Large Language Models to Refuse Unknown Questions Paper • 2311.09677 • Published Nov 16, 2023 • 3
Math-Shepherd: Verify and Reinforce LLMs Step-by-step without Human Annotations Paper • 2312.08935 • Published Dec 14, 2023 • 4
Some things are more CRINGE than others: Preference Optimization with the Pairwise Cringe Loss Paper • 2312.16682 • Published Dec 27, 2023 • 5
What Makes Good Data for Alignment? A Comprehensive Study of Automatic Data Selection in Instruction Tuning Paper • 2312.15685 • Published Dec 25, 2023 • 17
Model Merging Collection Model Merging is a very popular technique nowadays in LLM. Here is a chronological list of papers on the space that will help you get started with it! • 30 items • Updated Jun 12 • 216
A General Theoretical Paradigm to Understand Learning from Human Preferences Paper • 2310.12036 • Published Oct 18, 2023 • 14
MEDITRON-70B: Scaling Medical Pretraining for Large Language Models Paper • 2311.16079 • Published Nov 27, 2023 • 20
Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2 Paper • 2311.10702 • Published Nov 17, 2023 • 18