view article Article MedEmbed: Fine-Tuned Embedding Models for Medical / Clinical IR By abhinand • 2 days ago • 27
view article Article Advanced Flux Dreambooth LoRA Training with 🧨 diffusers By linoyts • 1 day ago • 27
Granite 3.0 models Collection A series of language models trained by IBM licensed under Apache 2.0 license. We release both the base pretrained and instruct models. • 8 items • Updated about 20 hours ago • 54
view article Article ColPali: Efficient Document Retrieval with Vision Language Models 👀 By manu • Jul 5 • 139
Llama 3.2 Collection This collection hosts the transformers and original repos of the Llama 3.2 and Llama Guard 3 • 11 items • Updated 27 days ago • 386
Molmo Collection Artifacts for open multimodal language models. • 5 items • Updated 27 days ago • 257
view article Article Does Daily Software Engineering Work Need Reasoning Models? By onekq • 29 days ago • 5
Leaderboards and benchmarks ✨ Collection Cool leaderboard spaces collection for models across modalities! Text, vision, audio, ... • 71 items • Updated about 12 hours ago • 87
DataGemma Release Collection A series of pioneering open models that help ground LLMs in real-world data through Data Commons. • 2 items • Updated Sep 12 • 76
OLMoE Collection Artifacts for open mixture-of-experts language models. • 13 items • Updated 28 days ago • 24
view article Article A Gentle Introduction to 8-bit Matrix Multiplication for transformers at scale using transformers, accelerate and bitsandbytes Aug 17, 2022 • 58
Magpie: Alignment Data Synthesis from Scratch by Prompting Aligned LLMs with Nothing Paper • 2406.08464 • Published Jun 12 • 62
The Carbon Footprint of Machine Learning Training Will Plateau, Then Shrink Paper • 2204.05149 • Published Apr 11, 2022 • 7
Gemma Scope Release Collection A comprehensive, open suite of sparse autoencoders for Gemma 2 2B and 9B. • 10 items • Updated Aug 11 • 13
Llama 3.1 Evals Collection This collection provides detailed information on how we derived the reported benchmark metrics for the Llama 3.1 models, including the configurations, • 6 items • Updated 27 days ago • 16
Model Merging Collection Model Merging is a very popular technique nowadays in LLM. Here is a chronological list of papers on the space that will help you get started with it! • 30 items • Updated Jun 12 • 216
NuminaMath Collection Datasets and models for training SOTA math LLMs. See our GitHub for training & inference code: https://github.com/project-numina/aimo-progress-prize • 6 items • Updated Jul 21 • 59
Scaling Synthetic Data Creation with 1,000,000,000 Personas Paper • 2406.20094 • Published Jun 28 • 94
view article Article SeeMoE: Implementing a MoE Vision Language Model from Scratch By AviSoori1x • Jun 23 • 33
G-Rank: Unsupervised Continuous Learn-to-Rank for Edge Devices in a P2P Network Paper • 2301.12530 • Published Jan 29, 2023 • 1
DataComp-LM: In search of the next generation of training sets for language models Paper • 2406.11794 • Published Jun 17 • 48
Routing to the Expert: Efficient Reward-guided Ensemble of Large Language Models Paper • 2311.08692 • Published Nov 15, 2023 • 12
FIFO-Diffusion: Generating Infinite Videos from Text without Training Paper • 2405.11473 • Published May 19 • 53
view article Article makeMoE: Implement a Sparse Mixture of Experts Language Model from Scratch By AviSoori1x • May 7 • 38
view article Article Unbelievable! Run 70B LLM Inference on a Single 4GB GPU with This NEW Technique By lyogavin • Nov 30, 2023 • 18
Leave No Context Behind: Efficient Infinite Context Transformers with Infini-attention Paper • 2404.07143 • Published Apr 10 • 103
VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time Paper • 2404.10667 • Published Apr 16 • 15
Be Yourself: Bounded Attention for Multi-Subject Text-to-Image Generation Paper • 2403.16990 • Published Mar 25 • 24
Adapting Large Language Models via Reading Comprehension Paper • 2309.09530 • Published Sep 18, 2023 • 75
The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits Paper • 2402.17764 • Published Feb 27 • 601
Sora: A Review on Background, Technology, Limitations, and Opportunities of Large Vision Models Paper • 2402.17177 • Published Feb 27 • 88
Neural Circuit Diagrams: Robust Diagrams for the Communication, Implementation, and Analysis of Deep Learning Architectures Paper • 2402.05424 • Published Feb 8 • 17
SDXL-Lightning: Progressive Adversarial Diffusion Distillation Paper • 2402.13929 • Published Feb 21 • 27
GES: Generalized Exponential Splatting for Efficient Radiance Field Rendering Paper • 2402.10128 • Published Feb 15 • 14
OpenMathInstruct-1: A 1.8 Million Math Instruction Tuning Dataset Paper • 2402.10176 • Published Feb 15 • 34
GraphCast: Learning skillful medium-range global weather forecasting Paper • 2212.12794 • Published Dec 24, 2022 • 1