3 44 20

sdtana

AI & ML interests

None yet

Organizations

None yet

sdtana's activity

upvoted 3 papers about 1 month ago

Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think

Paper • 2409.11355 • Published Sep 17 • 27

OmniGen: Unified Image Generation

Paper • 2409.11340 • Published Sep 17 • 83

Open-MAGVIT2: An Open-Source Project Toward Democratizing Auto-regressive Visual Generation

Paper • 2409.04410 • Published Sep 6 • 23

upvoted a paper about 2 months ago

FLUX that Plays Music

Paper • 2409.00587 • Published Sep 1 • 31

upvoted 2 papers 3 months ago

Efficient Training with Denoised Neural Weights

Paper • 2407.11966 • Published Jul 16 • 8

Vision language models are blind

Paper • 2407.06581 • Published Jul 9 • 82

upvoted 3 papers 4 months ago

Dataset Size Recovery from LoRA Weights

Paper • 2406.19395 • Published Jun 27 • 18

Glyph-ByT5-v2: A Strong Aesthetic Baseline for Accurate Multilingual Visual Text Rendering

Paper • 2406.10208 • Published Jun 14 • 21

An Image is Worth More Than 16x16 Patches: Exploring Transformers on Individual Pixels

Paper • 2406.09415 • Published Jun 13 • 50

upvoted an article 5 months ago

Article

An Analysis of Chinese LLM Censorship and Bias with Qwen 2 Instruct

•

Jun 11

• 47

upvoted 8 papers 5 months ago

Interpreting the Second-Order Effects of Neurons in CLIP

Paper • 2406.04341 • Published Jun 6 • 2

Step-aware Preference Optimization: Aligning Preference with Denoising Performance at Each Step

Paper • 2406.04314 • Published Jun 6 • 26

upvoted a paper 6 months ago

Scaling (Down) CLIP: A Comprehensive Analysis of Data, Architecture, and Training Strategies

Paper • 2404.08197 • Published Apr 12 • 27

upvoted 3 papers 7 months ago

Aligning Diffusion Models by Optimizing Human Utility

Paper • 2404.04465 • Published Apr 6 • 13

CoMat: Aligning Text-to-Image Diffusion Model with Image-to-Text Concept Matching

Paper • 2404.03653 • Published Apr 4 • 33

Getting it Right: Improving Spatial Consistency in Text-to-Image Models

Paper • 2404.01197 • Published Apr 1 • 30

upvoted 3 papers 9 months ago

Direct Language Model Alignment from Online AI Feedback

Paper • 2402.04792 • Published Feb 7 • 28

Scalable Pre-training of Large Autoregressive Image Models

Paper • 2401.08541 • Published Jan 16 • 35

Parrot: Pareto-optimal Multi-Reward Reinforcement Learning Framework for Text-to-Image Generation

Paper • 2401.05675 • Published Jan 11 • 20

upvoted 6 papers 10 months ago

Unlocking Pre-trained Image Backbones for Semantic Image Synthesis

Paper • 2312.13314 • Published Dec 20, 2023 • 7

StreamDiffusion: A Pipeline-level Solution for Real-time Interactive Generation

Paper • 2312.12491 • Published Dec 19, 2023 • 69

SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing

Paper • 2312.11392 • Published Dec 18, 2023 • 19

Amphion: An Open-Source Audio, Music and Speech Generation Toolkit

Paper • 2312.09911 • Published Dec 15, 2023 • 52

A Picture is Worth More Than 77 Text Tokens: Evaluating CLIP-Style Models on Dense Captions

Paper • 2312.08578 • Published Dec 14, 2023 • 16

Fast Training of Diffusion Transformer with Extreme Masking for 3D Point Clouds Generation

Paper • 2312.07231 • Published Dec 12, 2023 • 6

upvoted 8 papers 11 months ago

Scaling Laws of Synthetic Images for Model Training ... for Now

Paper • 2312.04567 • Published Dec 7, 2023 • 7

LooseControl: Lifting ControlNet for Generalized Depth Conditioning

Paper • 2312.03079 • Published Dec 5, 2023 • 12

Cache Me if You Can: Accelerating Diffusion Models through Block Caching

Paper • 2312.03209 • Published Dec 6, 2023 • 17

Analyzing and Improving the Training Dynamics of Diffusion Models

Paper • 2312.02696 • Published Dec 5, 2023 • 31

LucidDreamer: Domain-free Generation of 3D Gaussian Splatting Scenes

Paper • 2311.13384 • Published Nov 22, 2023 • 50

Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model

Paper • 2311.13231 • Published Nov 22, 2023 • 26

ShareGPT4V: Improving Large Multi-Modal Models with Better Captions

Paper • 2311.12793 • Published Nov 21, 2023 • 18

To See is to Believe: Prompting GPT-4V for Better Visual Instruction Tuning

Paper • 2311.07574 • Published Nov 13, 2023 • 14

upvoted 5 papers 12 months ago

GPT4All: An Ecosystem of Open Source Compressed Language Models

Paper • 2311.04931 • Published Nov 6, 2023 • 20

LLaVA-Plus: Learning to Use Tools for Creating Multimodal Agents

Paper • 2311.05437 • Published Nov 9, 2023 • 45

LCM-LoRA: A Universal Stable-Diffusion Acceleration Module

Paper • 2311.05556 • Published Nov 9, 2023 • 79

CapsFusion: Rethinking Image-Text Data at Scale

Paper • 2310.20550 • Published Oct 31, 2023 • 25

Beyond U: Making Diffusion Models Faster & Lighter

Paper • 2310.20092 • Published Oct 31, 2023 • 11