Tanvir1337 (Tanvir)

upvoted 4 articles about 16 hours ago

Article

MedEmbed: Fine-Tuned Embedding Models for Medical / Clinical IR

By

•

2 days ago

• 27

Article

Mamba Out

By

•

4 days ago

• 8

Article

AI is turning nuclear: a review

By

•

3 days ago

• 10

Article

Advanced Flux Dreambooth LoRA Training with 🧨 diffusers

By

•

1 day ago

• 27

upvoted a collection 1 day ago

Granite 3.0 models

Collection

A series of language models trained by IBM licensed under Apache 2.0 license. We release both the base pretrained and instruct models. • 8 items • Updated about 20 hours ago • 54

upvoted a paper 10 days ago

Qwen2.5-Coder Technical Report

Paper • 2409.12186 • Published Sep 18 • 123

upvoted an article 14 days ago

Article

Accelerate 1.0.0

Sep 13

• 48

upvoted an article 20 days ago

Article

Practical 3D Asset Generation: A Step-by-Step Guide

Aug 1, 2023

• 5

upvoted a collection 23 days ago

Emu3

Collection

4 items • Updated about 9 hours ago • 60

upvoted an article 26 days ago

Article

ColPali: Efficient Document Retrieval with Vision Language Models 👀

By

•

Jul 5

• 139

upvoted 2 collections 27 days ago

Llama 3.2

Collection

This collection hosts the transformers and original repos of the Llama 3.2 and Llama Guard 3 • 11 items • Updated 27 days ago • 386

Molmo

Collection

Artifacts for open multimodal language models. • 5 items • Updated 27 days ago • 257

upvoted an article 28 days ago

Article

Does Daily Software Engineering Work Need Reasoning Models?

By

•

29 days ago

• 5

upvoted 2 articles about 1 month ago

Article

Deploy Meta Llama 3.1 405B on Google Cloud Vertex AI

Aug 19

• 18

Article

🧨 Diffusers welcomes Stable Diffusion 3

Jun 12

• 87

upvoted 2 collections about 1 month ago

Leaderboards and benchmarks ✨

Collection

Cool leaderboard spaces collection for models across modalities! Text, vision, audio, ... • 71 items • Updated about 12 hours ago • 87

DataGemma Release

Collection

A series of pioneering open models that help ground LLMs in real-world data through Data Commons. • 2 items • Updated Sep 12 • 76

upvoted an article about 2 months ago

Article

🪆 Introduction to Matryoshka Embedding Models

Feb 23

• 50

upvoted a collection about 2 months ago

OLMoE

Collection

Artifacts for open mixture-of-experts language models. • 13 items • Updated 28 days ago • 24

upvoted an article 2 months ago

Article

The 5 Most Under-Rated Tools on Hugging Face

Aug 22

• 84

upvoted a paper 2 months ago

Qwen2-Audio Technical Report

Paper • 2407.10759 • Published Jul 15 • 54

upvoted 3 articles 3 months ago

Article

Google releases Gemma 2 2B, ShieldGemma and Gemma Scope

Jul 31

• 59

Article

A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytes

Aug 17, 2022

• 58

Article

Introduction to Quantization cooked in 🤗 with 💗🧑‍🍳

By

•

Aug 25, 2023

• 18

upvoted 2 papers 3 months ago

Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing

Paper • 2406.08464 • Published Jun 12 • 62

The Carbon Footprint of Machine Learning Training Will Plateau, Then Shrink

Paper • 2204.05149 • Published Apr 11, 2022 • 7

upvoted a collection 3 months ago

Gemma Scope Release

Collection

A comprehensive, open suite of sparse autoencoders for Gemma 2 2B and 9B. • 10 items • Updated Aug 11 • 13

upvoted 2 articles 3 months ago

Article

SegMoE: Segmind Mixture of Diffusion Experts

Feb 3

• 6

Article

Mixture of Experts Explained

Dec 11, 2023

• 171

upvoted 2 collections 3 months ago

Llama 3.1 Evals

Collection

This collection provides detailed information on how we derived the reported benchmark metrics for the Llama 3.1 models, including the configurations, • 6 items • Updated 27 days ago • 16

Model Merging

Collection

Model Merging is a very popular technique nowadays in LLM. Here is a chronological list of papers on the space that will help you get started with it! • 30 items • Updated Jun 12 • 216

upvoted 2 articles 3 months ago

Article

Merge Large Language Models with mergekit

By

•

Jan 9

• 73

Article

SmolLM - blazingly fast and remarkably powerful

Jul 16

• 248

upvoted a collection 3 months ago

NuminaMath

Collection

Datasets and models for training SOTA math LLMs. See our GitHub for training & inference code: https://github.com/project-numina/aimo-progress-prize • 6 items • Updated Jul 21 • 59

upvoted an article 3 months ago

Article

Train a Llama model from scratch

By

•

Jul 29

• 44

upvoted a paper 4 months ago

Scaling Synthetic Data Creation with 1,000,000,000 Personas

Paper • 2406.20094 • Published Jun 28 • 94

upvoted an article 4 months ago

Article

SeeMoE: Implementing a MoE Vision Language Model from Scratch

By

•

Jun 23

• 33

upvoted 4 papers 4 months ago

DataComp-LM: In search of the next generation of training sets for language models

Paper • 2406.11794 • Published Jun 17 • 48

upvoted an article 5 months ago

Article

Uncensor any LLM with abliteration

By

•

Jun 13

• 350

upvoted 2 papers 5 months ago

Routing to the Expert: Efficient Reward-guided Ensemble of Large Language Models

Paper • 2311.08692 • Published Nov 15, 2023 • 12

FIFO-Diffusion: Generating Infinite Videos from Text without Training

Paper • 2405.11473 • Published May 19 • 53

upvoted 2 articles 5 months ago

Article

makeMoE: Implement a Sparse Mixture of Experts Language Model from Scratch

By

•

May 7

• 38

Article

Unbelievable! Run 70B LLM Inference on a Single 4GB GPU with This NEW Technique

By

•

Nov 30, 2023

• 18

upvoted 5 papers 6 months ago

Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention

Paper • 2404.07143 • Published Apr 10 • 103

VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time

Paper • 2404.10667 • Published Apr 16 • 15

Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation

Paper • 2403.16990 • Published Mar 25 • 24

TransformerFAM: Feedback attention is working memory

Paper • 2404.09173 • Published Apr 14 • 43

Grandmaster-Level Chess Without Search

Paper • 2402.04494 • Published Feb 7 • 67

upvoted a paper 7 months ago

ReALM: Reference Resolution As Language Modeling

Paper • 2403.20329 • Published Mar 29 • 20

upvoted 8 papers 8 months ago

Adapting Large Language Models via Reading Comprehension

Paper • 2309.09530 • Published Sep 18, 2023 • 75

The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits

Paper • 2402.17764 • Published Feb 27 • 601

Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models

Paper • 2402.17177 • Published Feb 27 • 88

Neural Circuit Diagrams: Robust Diagrams for the Communication, Implementation, and Analysis of Deep Learning Architectures

Paper • 2402.05424 • Published Feb 8 • 17

SDXL-Lightning: Progressive Adversarial Diffusion Distillation

Paper • 2402.13929 • Published Feb 21 • 27

GES: Generalized Exponential Splatting for Efficient Radiance Field Rendering

Paper • 2402.10128 • Published Feb 15 • 14

OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset

Paper • 2402.10176 • Published Feb 15 • 34

GraphCast: Learning skillful medium-range global weather forecasting

Paper • 2212.12794 • Published Dec 24, 2022 • 1

Tanvir

AI & ML interests

Organizations

Tanvir1337's activity

MedEmbed: Fine-Tuned Embedding Models for Medical / Clinical IR

Mamba Out

AI is turning nuclear: a review

Advanced Flux Dreambooth LoRA Training with 🧨 diffusers

Accelerate 1.0.0

Practical 3D Asset Generation: A Step-by-Step Guide

ColPali: Efficient Document Retrieval with Vision Language Models 👀

Does Daily Software Engineering Work Need Reasoning Models?

Deploy Meta Llama 3.1 405B on Google Cloud Vertex AI

🧨 Diffusers welcomes Stable Diffusion 3

🪆 Introduction to Matryoshka Embedding Models

The 5 Most Under-Rated Tools on Hugging Face

Google releases Gemma 2 2B, ShieldGemma and Gemma Scope

A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytes

Introduction to Quantization cooked in 🤗 with 💗🧑‍🍳

SegMoE: Segmind Mixture of Diffusion Experts

Mixture of Experts Explained

Merge Large Language Models with mergekit

SmolLM - blazingly fast and remarkably powerful

Train a Llama model from scratch

SeeMoE: Implementing a MoE Vision Language Model from Scratch

Uncensor any LLM with abliteration

makeMoE: Implement a Sparse Mixture of Experts Language Model from Scratch

Unbelievable! Run 70B LLM Inference on a Single 4GB GPU with This NEW Technique