16 32 39

Elie Bakouch

eliebak

AI & ML interests

Training LLM's @ 🤗

Articles

SmolLM - blazingly fast and remarkably powerful

Jul 16

• 248

Organizations

eliebak's activity

upvoted a collection 8 days ago

LoLCATS

Collection

Linearizing LLMs with high quality and efficiency. We linearize the full Llama 3.1 model family -- 8b, 70b, 405b -- for the first time! • 4 items • Updated 9 days ago • 12

upvoted a paper 13 days ago

Differential Transformer

Paper • 2410.05258 • Published 15 days ago • 159

upvoted a paper 26 days ago

EuroLLM: Multilingual Language Models for Europe

Paper • 2409.16235 • Published 28 days ago • 23

upvoted a paper about 1 month ago

To CoT or not to CoT? Chain-of-thought helps mainly on math and symbolic reasoning

Paper • 2409.12183 • Published Sep 18 • 36

upvoted an article about 1 month ago

Article

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

Sep 18

• 178

upvoted 2 papers about 2 months ago

Enhancing Training Efficiency Using Packing with Flash Attention

Paper • 2407.09105 • Published Jul 12 • 13

Power Scheduler: A Batch Size and Token Number Agnostic Learning Rate Scheduler

Paper • 2408.13359 • Published Aug 23 • 21

upvoted a collection about 2 months ago

Power-LM

Collection

Dense & MoE LLMs trained with power learning rate scheduler. • 4 items • Updated 5 days ago • 15

upvoted 2 articles about 2 months ago

Article

Introducing AuraFace: Open-Source Face Recognition and Identity Preservation Models

•

Aug 26

• 35

Article

MicroJAX

•

Aug 25

• 15

upvoted a paper 2 months ago

To Code, or Not To Code? Exploring Impact of Code in Pre-training

Paper • 2408.10914 • Published Aug 20 • 40

upvoted 2 collections 2 months ago

💻 Local SmolLMs

Collection

SmolLM models in MLC, ONNX and GGUF format for local applications + in-browser demos • 14 items • Updated Aug 20 • 41

🪐 SmolLM

Collection

A series of smol LLMs: 135M, 360M and 1.7B. We release base and Instruct models as well as the training corpus and some WebGPU demos • 12 items • Updated Aug 18 • 176

upvoted an article 2 months ago

Article

A failed experiment: Infini-Attention, and why we should keep trying?

Aug 14

• 46

upvoted a paper 2 months ago

LongWriter: Unleashing 10,000+ Word Generation from Long Context LLMs

Paper • 2408.07055 • Published Aug 13 • 65

upvoted an article 2 months ago

Article

Introduction to ggml

Aug 13

• 107

upvoted a paper 2 months ago

Tree Attention: Topology-aware Decoding for Long-Context Attention on GPU clusters

Paper • 2408.04093 • Published Aug 7 • 4

upvoted an article 3 months ago

Article

The case for specialized pre-training: ultra-fast foundation models for dedicated tasks

•

Aug 4

• 26

upvoted 3 papers 3 months ago

The Llama 3 Herd of Models

Paper • 2407.21783 • Published Jul 31 • 103

Sailor: Open Language Models for South-East Asia

Paper • 2404.03608 • Published Apr 4 • 20

Data Mixture Inference: What do BPE Tokenizers Reveal about their Training Data?

Paper • 2407.16607 • Published Jul 23 • 21

upvoted 2 articles 3 months ago

Article

LAVE: Zero-shot VQA Evaluation on Docmatix with LLMs - Do We Still Need Fine-Tuning?

Jul 25

• 18

Article

SmolLM - blazingly fast and remarkably powerful

Jul 16

• 248

upvoted a paper 3 months ago

Qwen2 Technical Report

Paper • 2407.10671 • Published Jul 15 • 155

upvoted 3 papers 4 months ago

MobileLLM: Optimizing Sub-billion Parameter Language Models for On-Device Use Cases

Paper • 2402.14905 • Published Feb 22 • 108

Let the Expert Stick to His Last: Expert-Specialized Fine-Tuning for Sparse Architectural Large Language Models

Paper • 2407.01906 • Published Jul 2 • 34

The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

Paper • 2406.17557 • Published Jun 25 • 85

upvoted a collection 4 months ago

Nemotron 4 340B

Collection

Nemotron-4: open models for Synthetic Data Generation (SDG). Includes Base, Instruct, and Reward models. • 4 items • Updated 22 days ago • 156

upvoted a paper 5 months ago

Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations

Paper • 2405.18392 • Published May 28 • 12

upvoted 2 collections 6 months ago

🍷 FineWeb datasets

Collection

5 items • Updated Jun 26 • 19

[lecture artifacts] aligning open language models

Collection

artifacts referenced in the talk timeline! Slides: https://docs.google.com/presentation/d/1quMyI4BAx4rvcDfk8jjv063bmHg4RxZd9mhQloXpMn0/edit?usp=sharin • 63 items • Updated Apr 17 • 56

upvoted a paper 9 months ago

Secrets of RLHF in Large Language Models Part II: Reward Modeling

Paper • 2401.06080 • Published Jan 11 • 25