6 82 157

Pierre Dulac

dulacp

dulacp

AI & ML interests

None yet

Organizations

dulacp's activity

upvoted a collection 3 days ago

Llama-3.1-Nemotron-70B

Collection

SOTA models on Arena Hard and RewardBench as of 1 Oct 2024. • 6 items • Updated 7 days ago • 113

upvoted 2 papers 8 days ago

Pixtral 12B

Paper • 2410.07073 • Published 13 days ago • 57

Baichuan-Omni Technical Report

Paper • 2410.08565 • Published 11 days ago • 80

upvoted an article 13 days ago

Article

ColPali: Efficient Document Retrieval with Vision Language Models 👀

•

Jul 5

• 139

upvoted a paper 17 days ago

Large Language Models as Markov Chains

Paper • 2410.02724 • Published 19 days ago • 31

upvoted a collection 26 days ago

Molmo

Collection

Artifacts for open multimodal language models. • 5 items • Updated 27 days ago • 257

upvoted an article 27 days ago

Article

Llama can now see and run on your device - welcome Llama 3.2

28 days ago

• 153

upvoted 3 articles about 1 month ago

Article

Uncensor any LLM with abliteration

•

Jun 13

• 350

Article

TGI Multi-LoRA: Deploy Once, Serve 30 Models

Jul 18

• 44

Article

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

Sep 18

• 177

upvoted 2 papers about 1 month ago

General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model

Paper • 2409.01704 • Published Sep 3 • 80

INTRA: Interaction Relationship-aware Weakly Supervised Affordance Grounding

Paper • 2409.06210 • Published Sep 10 • 24

upvoted a paper about 2 months ago

Controllable Text Generation for Large Language Models: A Survey

Paper • 2408.12599 • Published Aug 22 • 62

upvoted a collection 4 months ago

LLM Compiler

Collection

Meta LLM Compiler is a state-of-the-art LLM that builds upon Code Llama with improved performance for code optimization and compiler reasoning. • 4 items • Updated Jun 27 • 147

upvoted 4 papers 5 months ago

upvoted 2 articles 6 months ago

Article

A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytes

Aug 17, 2022

• 58

Article

Post-OCR-Correction: 1 billion words dataset of automated OCR correction by LLM

•

Apr 26

• 14

upvoted 8 papers 6 months ago

Make Your LLM Fully Utilize the Context

Paper • 2404.16811 • Published Apr 25 • 52

SnapKV: LLM Knows What You are Looking for Before Generation

Paper • 2404.14469 • Published Apr 22 • 23

Transformers Can Represent n-gram Language Models

Paper • 2404.14994 • Published Apr 23 • 18

OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework

Paper • 2404.14619 • Published Apr 22 • 124

Reuse Your Rewards: Reward Model Transfer for Zero-Shot Cross-Lingual Alignment

Paper • 2404.12318 • Published Apr 18 • 14

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Paper • 2404.14219 • Published Apr 22 • 251

RecurrentGemma: Moving Past Transformers for Efficient Open Language Models

Paper • 2404.07839 • Published Apr 11 • 41

Elucidating the Design Space of Diffusion-Based Generative Models

Paper • 2206.00364 • Published Jun 1, 2022 • 13

upvoted an article 6 months ago

Article

LoRA training scripts of the world, unite!

Jan 2

• 42

upvoted 6 papers 7 months ago

ReFT: Representation Finetuning for Language Models

Paper • 2404.03592 • Published Apr 4 • 89

Soaring from 4K to 400K: Extending LLM's Context with Activation Beacon

Paper • 2401.03462 • Published Jan 7 • 26

Zephyr: Direct Distillation of LM Alignment

Paper • 2310.16944 • Published Oct 25, 2023 • 120

Yi: Open Foundation Models by 01.AI

Paper • 2403.04652 • Published Mar 7 • 62

Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context

Paper • 2403.05530 • Published Mar 8 • 59

Stealing Part of a Production Language Model

Paper • 2403.06634 • Published Mar 11 • 90

upvoted a paper 8 months ago

LM-Infinite: Simple On-the-Fly Length Generalization for Large Language Models

Paper • 2308.16137 • Published Aug 30, 2023 • 39

upvoted a collection 8 months ago

Long context

Collection

94 items • Updated 24 days ago • 29

upvoted 15 papers 8 months ago

YaRN: Efficient Context Window Extension of Large Language Models

Paper • 2309.00071 • Published Aug 31, 2023 • 65

Linear Transformers with Learnable Kernel Functions are Better In-Context Models

Paper • 2402.10644 • Published Feb 16 • 78

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27 • 601

Mixtures of Experts Unlock Parameter Scaling for Deep RL

Paper • 2402.08609 • Published Feb 13 • 34

Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models

Paper • 2402.13064 • Published Feb 20 • 46

In Search of Needles in a 10M Haystack: Recurrent Memory Finds What LLMs Miss

Paper • 2402.10790 • Published Feb 16 • 40

Scaling Transformer to 1M tokens and beyond with RMT

Paper • 2304.11062 • Published Apr 19, 2023 • 2

Recurrent Memory Transformer

Paper • 2207.06881 • Published Jul 14, 2022 • 1

Transformers Can Achieve Length Generalization But Not Robustly

Paper • 2402.09371 • Published Feb 14 • 12

Premise Order Matters in Reasoning with Large Language Models

Paper • 2402.08939 • Published Feb 14 • 25

Aya Model: An Instruction Finetuned Open-Access Multilingual Language Model

Paper • 2402.07827 • Published Feb 12 • 45

A Human-Inspired Reading Agent with Gist Memory of Very Long Contexts

Paper • 2402.09727 • Published Feb 15 • 35

OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset

Paper • 2402.10176 • Published Feb 15 • 34

Data Engineering for Scaling Language Models to 128K Context

Paper • 2402.10171 • Published Feb 15 • 21

Buffer Overflow in Mixture of Experts

Paper • 2402.05526 • Published Feb 8 • 8

upvoted 8 papers 9 months ago

EVA-CLIP-18B: Scaling CLIP to 18 Billion Parameters

Paper • 2402.04252 • Published Feb 6 • 25

Repeat After Me: Transformers are Better than State Space Models at Copying

Paper • 2402.01032 • Published Feb 1 • 22

Learning Universal Predictors

Paper • 2401.14953 • Published Jan 26 • 18

SliceGPT: Compress Large Language Models by Deleting Rows and Columns

Paper • 2401.15024 • Published Jan 26 • 68

Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding

Paper • 2401.04398 • Published Jan 9 • 20

LMDX: Language Model-based Document Information Extraction and Localization

Paper • 2309.10952 • Published Sep 19, 2023 • 65

LLaMA Beyond English: An Empirical Study on Language Capability Transfer

Paper • 2401.01055 • Published Jan 2 • 53

Self-Play Fine-Tuning Converts Weak Language Models to Strong Language Models

Paper • 2401.01335 • Published Jan 2 • 64