Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations Paper • 2410.10792 • Published 8 days ago • 26
Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models Paper • 2410.02416 • Published 19 days ago • 25
Loradex Highlights Collection This collection features awesome opensource LoRAs trained by members of the Glif Community during Loradex Early Access! • 14 items • Updated 4 days ago • 17
view article Article Getty Images Brings High-Quality, Commercially Safe Dataset to Hugging Face By andreagagliano • Sep 6 • 16
view article Article Enhancing Image Model Dreambooth Training Through Effective Captioning: Key Observations By alvdansen • Jun 19 • 17
view article Article Introducing AuraFace: Open-Source Face Recognition and Identity Preservation Models By isidentical • Aug 26 • 35
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer Paper • 2408.06072 • Published Aug 12 • 35
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery Paper • 2408.06292 • Published Aug 12 • 115
IPAdapter-Instruct: Resolving Ambiguity in Image-based Conditioning using Instruct Prompts Paper • 2408.03209 • Published Aug 6 • 21
Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions Paper • 2407.06723 • Published Jul 9 • 10
Chameleon Collection Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR. • 2 items • Updated Jul 9 • 25
MUMU: Bootstrapping Multimodal Image Generation from Text-to-Image Data Paper • 2406.18790 • Published Jun 26 • 33
DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation Paper • 2208.12242 • Published Aug 25, 2022 • 10
Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation Paper • 2406.06525 • Published Jun 10 • 65
CommonCatalog Collection Common Catalog, a dataset with Creative Commons licensed images and machine-generated caption pairs • 8 items • Updated May 16 • 14
CommonCanvas Collection Collection of models trained on the CommonCatalogue datasets • 8 items • Updated May 16 • 9
Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding Paper • 2405.08748 • Published May 14 • 19
Perturbed Attention Guidance pipelines Collection Pipelines for Perturbed Attention Guidance with 🧨 library • 8 items • Updated Jun 26 • 6
From Parts to Whole: A Unified Reference Framework for Controllable Human Image Generation Paper • 2404.15267 • Published Apr 23 • 4
OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework Paper • 2404.14619 • Published Apr 22 • 124
HiDiffusion: Unlocking High-Resolution Creativity and Efficiency in Low-Resolution Trained Diffusion Models Paper • 2311.17528 • Published Nov 29, 2023 • 4
Leaderboards and benchmarks ✨ Collection Cool leaderboard spaces collection for models across modalities! Text, vision, audio, ... • 71 items • Updated about 9 hours ago • 87
Phi-3 Collection Phi-3 family of small language and multi-modal models. Language models are available in short- and long-context lengths. • 27 items • Updated Sep 18 • 480
Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone Paper • 2404.14219 • Published Apr 22 • 251
Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis Paper • 2404.13686 • Published Apr 21 • 27
VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time Paper • 2404.10667 • Published Apr 16 • 15
HairFastGAN: Realistic and Robust Hair Transfer with a Fast Encoder-Based Approach Paper • 2404.01094 • Published Apr 1 • 4
Natural language guidance of high-fidelity text-to-speech with synthetic annotations Paper • 2402.01912 • Published Feb 2 • 11
HF-curated models available on Workers AI Collection A collection of models curated with Hugging Face that can be run on Cloudflare's Workers AI serverless inference platform. • 15 items • Updated Apr 2 • 51
🎭 Avatars Collection The latest AI-powered technologies usher in a new era of realistic avatars! 🚀 • 69 items • Updated 1 day ago • 74
SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion Paper • 2403.12008 • Published Mar 18 • 19
Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation Paper • 2403.12015 • Published Mar 18 • 63
ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment Paper • 2403.05135 • Published Mar 8 • 42
Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference Paper • 2403.04132 • Published Mar 7 • 38
PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation Paper • 2403.04692 • Published Mar 7 • 40
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis Paper • 2403.03206 • Published Mar 5 • 56
Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers Paper • 2402.19479 • Published Feb 29 • 32
Gemma release Collection Groups the Gemma models released by the Google team. • 40 items • Updated Jul 31 • 325
Text-to-Image Base Models Collection All text-to-image open source base models, with their respective license • 28 items • Updated May 10 • 20
MusicMagus: Zero-Shot Text-to-Music Editing via Diffusion Models Paper • 2402.06178 • Published Feb 9 • 13
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models Paper • 2402.03300 • Published Feb 5 • 67