37 36 65

Marc Sun

marcsun13

AI & ML interests

LLM, Quantization, Training, Inference

Articles

Organizations

marcsun13's activity

upvoted an article 6 days ago

Article

Fixing Gradient Accumulation

7 days ago

• 32

upvoted 3 articles about 1 month ago

Article

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

Sep 18

• 177

Article

Accelerate 1.0.0

Sep 13

• 48

Article

SmolLM - blazingly fast and remarkably powerful

Jul 16

• 248

upvoted an article 3 months ago

Article

XetHub is joining Hugging Face!

Aug 8

• 79

upvoted an article 5 months ago

Article

Benchmarking Text Generation Inference

May 29

• 27

upvoted a paper 5 months ago

Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations

Paper • 2405.18392 • Published May 28 • 12

upvoted an article 5 months ago

Article

License to Call: Introducing Transformers Agents 2.0

May 13

• 114

upvoted a paper 6 months ago

BitNet: Scaling 1-bit Transformers for Large Language Models

Paper • 2310.11453 • Published Oct 17, 2023 • 96

upvoted an article 6 months ago

Article

Welcome Llama 3 - Meta's new open LLM

Apr 18

• 275

upvoted a collection 6 months ago

Meta Llama 3

Collection

This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated 27 days ago • 680

upvoted 3 articles 6 months ago

Article

Vision Language Models Explained

Apr 11

• 193

Article

Introducing Idefics2: A Powerful 8B Vision-Language Model for the community

Apr 15

• 163

Article

Overview of natively supported quantization schemes in 🤗 Transformers

Sep 12, 2023

• 10

upvoted 7 articles 7 months ago

Article

Making LLMs lighter with AutoGPTQ and transformers

Aug 23, 2023

• 30

Article

A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytes

Aug 17, 2022

• 58

Article

Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA

May 24, 2023

• 84

Article

Binary and Scalar Embedding Quantization for Significantly Faster & Cheaper Retrieval

Mar 22

• 58

Article

quanto: a pytorch quantization toolkit

Mar 18

• 28

Article

GaLore: Advancing Large Model Training on Consumer-grade Hardware

Mar 20

• 25

Article

Outpainting II - Differential Diffusion

•

Apr 23

• 47

upvoted 2 papers 7 months ago

KIVI: A Tuning-Free Asymmetric 2bit Quantization for KV Cache

Paper • 2402.02750 • Published Feb 5 • 3

Unlocking the conversion of Web Screenshots into HTML Code with the WebSight Dataset

Paper • 2403.09029 • Published Mar 14 • 54

upvoted 3 papers 8 months ago

StarCoder 2 and The Stack v2: The Next Generation

Paper • 2402.19173 • Published Feb 29 • 134

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27 • 601

BiLLM: Pushing the Limit of Post-Training Quantization for LLMs

Paper • 2402.04291 • Published Feb 6 • 48

upvoted 3 papers 10 months ago

Mixtral of Experts

Paper • 2401.04088 • Published Jan 8 • 157

TinyLlama: An Open-Source Small Language Model

Paper • 2401.02385 • Published Jan 4 • 89

LLM in a flash: Efficient Large Language Model Inference with Limited Memory

Paper • 2312.11514 • Published Dec 12, 2023 • 257

upvoted a collection 10 months ago

BLIP models

Collection

A collection of all BLIP models • 8 items • Updated 14 days ago • 19

upvoted a paper 10 months ago

ZeroQuant(4+2): Redefining LLMs Quantization with a New FP6-Centric Strategy for Diverse Generative Tasks

Paper • 2312.08583 • Published Dec 14, 2023 • 9

upvoted 2 papers 11 months ago

Effective Quantization for Diffusion Models on CPUs

Paper • 2311.16133 • Published Nov 2, 2023 • 4

Zephyr: Direct Distillation of LM Alignment

Paper • 2310.16944 • Published Oct 25, 2023 • 120

upvoted a collection about 1 year ago

Recent models: last 100 repos, sorted by creation date

Collection

The last 100 repos I have created. Sorted by creation date descending, so the most recently created repos appear at the top. • 121 items • Updated Jan 31 • 498

upvoted 2 papers about 1 year ago

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 43

GPTQ: Accurate Post-Training Quantization for Generative Pre-trained Transformers

Paper • 2210.17323 • Published Oct 31, 2022 • 7

Marc Sun

AI & ML interests

Articles

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

Accelerate 1.0.0

Llama 3.1 - 405B, 70B & 8B with multilinguality and long context

quanto: a pytorch quantization toolkit

Overview of natively supported quantization schemes in 🤗 Transformers

Making LLMs lighter with AutoGPTQ and transformers

Organizations

marcsun13's activity

Fixing Gradient Accumulation

Fine-tuning LLMs to 1.58bit: extreme quantization made easy

Accelerate 1.0.0

SmolLM - blazingly fast and remarkably powerful

XetHub is joining Hugging Face!

Benchmarking Text Generation Inference

License to Call: Introducing Transformers Agents 2.0

Welcome Llama 3 - Meta's new open LLM

Vision Language Models Explained

Introducing Idefics2: A Powerful 8B Vision-Language Model for the community

Overview of natively supported quantization schemes in 🤗 Transformers

Making LLMs lighter with AutoGPTQ and transformers

A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytes

Making LLMs even more accessible with bitsandbytes, 4-bit quantization and QLoRA

Binary and Scalar Embedding Quantization for Significantly Faster & Cheaper Retrieval

quanto: a pytorch quantization toolkit

GaLore: Advancing Large Model Training on Consumer-grade Hardware

Outpainting II - Differential Diffusion