Apolinário from multimodal AI art's picture

Apolinário from multimodal AI art PRO

multimodalart

·

https://multimodal.art

AI & ML interests

None yet

Articles

🧨 Diffusers welcomes Stable Diffusion 3.5 Large

about 18 hours ago

🧨 Diffusers welcomes Stable Diffusion 3

LoRA training scripts of the world, unite!

SDXL in 4 steps with Latent Consistency LoRAs

Running IF with 🧨 diffusers on a Free Tier Google Colab

Train your ControlNet with diffusers

Organizations

multimodalart's activity

upvoted a collection about 4 hours ago

Stable Diffusion 3.5

4 items • Updated about 4 hours ago • 13

upvoted a paper 8 days ago

Semantic Image Inversion and Editing using Rectified Stochastic Differential Equations

Paper • 2410.10792 • Published 8 days ago • 26

upvoted a paper 16 days ago

Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models

Paper • 2410.02416 • Published 19 days ago • 25

upvoted 2 collections 25 days ago

Loradex Highlights

This collection features awesome opensource LoRAs trained by members of the Glif Community during Loradex Early Access! • 14 items • Updated 4 days ago • 17

Emu3

4 items • Updated about 6 hours ago • 60

upvoted 2 articles about 2 months ago

Article

Getty Images Brings High-Quality, Commercially Safe Dataset to Hugging Face

By

•

Sep 6

• 16

Article

Enhancing Image Model Dreambooth Training Through Effective Captioning: Key Observations

By

•

Jun 19

• 17

upvoted a collection about 2 months ago

CogVideo

7 items • Updated Sep 18 • 22

upvoted an article about 2 months ago

Article

Introducing AuraFace: Open-Source Face Recognition and Identity Preservation Models

By

•

Aug 26

• 35

upvoted 3 papers 2 months ago

Imagen 3

Paper • 2408.07009 • Published Aug 13 • 60

CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer

Paper • 2408.06072 • Published Aug 12 • 35

The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

Paper • 2408.06292 • Published Aug 12 • 115

upvoted 5 papers 3 months ago

IPAdapter-Instruct: Resolving Ambiguity in Image-based Conditioning using Instruct Prompts

Paper • 2408.03209 • Published Aug 6 • 21

Discrete Flow Matching

Paper • 2407.15595 • Published Jul 22 • 11

Scaling Diffusion Transformers to 16 Billion Parameters

Paper • 2407.11633 • Published Jul 16 • 25

Qwen2 Technical Report

Paper • 2407.10671 • Published Jul 15 • 155

Graph-Based Captioning: Enhancing Visual Descriptions by Interconnecting Region Captions

Paper • 2407.06723 • Published Jul 9 • 10

upvoted a collection 3 months ago

Chameleon

Repository for Meta Chameleon, a mixed-modal early-fusion foundation model from FAIR. • 2 items • Updated Jul 9 • 25

upvoted an article 3 months ago

Article

How to run Gemini Nano locally in your browser

By

•

Jul 11

• 42

upvoted 2 articles 4 months ago

Article

Google Cloud TPUs made available to Hugging Face users

Jul 9

• 19

Article

How I train a LoRA: m3lt style training overview

By

•

Jul 1

• 47

upvoted 2 papers 4 months ago

MUMU: Bootstrapping Multimodal Image Generation from Text-to-Image Data

Paper • 2406.18790 • Published Jun 26 • 33

DreamBooth: Fine Tuning Text-to-Image Diffusion Models for Subject-Driven Generation

Paper • 2208.12242 • Published Aug 25, 2022 • 10

upvoted an article 4 months ago

Article

Thoughts on LoRA Training #1

By

•

Jun 18

• 31

upvoted a paper 4 months ago

Autoregressive Model Beats Diffusion: Llama for Scalable Image Generation

Paper • 2406.06525 • Published Jun 10 • 65

upvoted an article 5 months ago

Article

Explaining the SDXL latent space

By

•

May 20

• 30

upvoted 2 collections 5 months ago

CommonCatalog

Common Catalog, a dataset with Creative Commons licensed images and machine-generated caption pairs • 8 items • Updated May 16 • 14

CommonCanvas

Collection of models trained on the CommonCatalogue datasets • 8 items • Updated May 16 • 9

upvoted a paper 5 months ago

Hunyuan-DiT: A Powerful Multi-Resolution Diffusion Transformer with Fine-Grained Chinese Understanding

Paper • 2405.08748 • Published May 14 • 19

upvoted a collection 6 months ago

Perturbed Attention Guidance pipelines

Pipelines for Perturbed Attention Guidance with 🧨 library • 8 items • Updated Jun 26 • 6

upvoted a paper 6 months ago

From Parts to Whole: A Unified Reference Framework for Controllable Human Image Generation

Paper • 2404.15267 • Published Apr 23 • 4

upvoted a collection 6 months ago

OpenELM Instruct Models

4 items • Updated 18 days ago • 113

upvoted 2 papers 6 months ago

OpenELM: An Efficient Language Model Family with Open-source Training and Inference Framework

Paper • 2404.14619 • Published Apr 22 • 124

HiDiffusion: Unlocking High-Resolution Creativity and Efficiency in Low-Resolution Trained Diffusion Models

Paper • 2311.17528 • Published Nov 29, 2023 • 4

upvoted a collection 6 months ago

Leaderboards and benchmarks ✨

Cool leaderboard spaces collection for models across modalities! Text, vision, audio, ... • 71 items • Updated about 9 hours ago • 87

upvoted an article 6 months ago

Article

LoRA training scripts of the world, unite!

Jan 2

• 42

upvoted a collection 6 months ago

Phi-3

Phi-3 family of small language and multi-modal models. Language models are available in short- and long-context lengths. • 27 items • Updated Sep 18 • 480

upvoted 4 papers 6 months ago

Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone

Paper • 2404.14219 • Published Apr 22 • 251

Hyper-SD: Trajectory Segmented Consistency Model for Efficient Image Synthesis

Paper • 2404.13686 • Published Apr 21 • 27

VASA-1: Lifelike Audio-Driven Talking Faces Generated in Real Time

Paper • 2404.10667 • Published Apr 16 • 15

HairFastGAN: Realistic and Robust Hair Transfer with a Fast Encoder-Based Approach

Paper • 2404.01094 • Published Apr 1 • 4

upvoted a paper 7 months ago

Natural language guidance of high-fidelity text-to-speech with synthetic annotations

Paper • 2402.01912 • Published Feb 2 • 11

upvoted 3 collections 7 months ago

HF-curated models available on Workers AI

A collection of models curated with Hugging Face that can be run on Cloudflare's Workers AI serverless inference platform. • 15 items • Updated Apr 2 • 51

🎭 Avatars

The latest AI-powered technologies usher in a new era of realistic avatars! 🚀 • 69 items • Updated 1 day ago • 74

VILA: On Pre-training for Visual Language Models

10 items • Updated Aug 21 • 44

upvoted 2 papers 7 months ago

SV3D: Novel Multi-view Synthesis and 3D Generation from a Single Image using Latent Video Diffusion

Paper • 2403.12008 • Published Mar 18 • 19

Fast High-Resolution Image Synthesis with Latent Adversarial Diffusion Distillation

Paper • 2403.12015 • Published Mar 18 • 63

upvoted 7 papers 8 months ago

ELLA: Equip Diffusion Models with LLM for Enhanced Semantic Alignment

Paper • 2403.05135 • Published Mar 8 • 42

StableDrag: Stable Dragging for Point-based Image Editing

Paper • 2403.04437 • Published Mar 7 • 25

Chatbot Arena: An Open Platform for Evaluating LLMs by Human Preference

Paper • 2403.04132 • Published Mar 7 • 38

PixArt-Σ: Weak-to-Strong Training of Diffusion Transformer for 4K Text-to-Image Generation

Paper • 2403.04692 • Published Mar 7 • 40

Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

Paper • 2403.03206 • Published Mar 5 • 56

StarCoder 2 and The Stack v2: The Next Generation

Paper • 2402.19173 • Published Feb 29 • 134

Panda-70M: Captioning 70M Videos with Multiple Cross-Modality Teachers

Paper • 2402.19479 • Published Feb 29 • 32

upvoted 2 collections 8 months ago

Playground v2.5

2 items • Updated Feb 27 • 23

Gemma release

Groups the Gemma models released by the Google team. • 40 items • Updated Jul 31 • 325

upvoted a paper 8 months ago

Neural Network Diffusion

Paper • 2402.13144 • Published Feb 20 • 94

upvoted a collection 8 months ago

Text-to-Image Base Models

All text-to-image open source base models, with their respective license • 28 items • Updated May 10 • 20

upvoted a paper 8 months ago

MusicMagus: Zero-Shot Text-to-Music Editing via Diffusion Models

Paper • 2402.06178 • Published Feb 9 • 13

upvoted a paper 9 months ago

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Paper • 2402.03300 • Published Feb 5 • 67