EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions Paper • 2409.18042 • Published 26 days ago • 36 • 6
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling Paper • 2408.16532 • Published Aug 29 • 46 • 4
Seed-TTS: A Family of High-Quality Versatile Speech Generation Models Paper • 2406.02430 • Published Jun 4 • 29 • 2
MMLU-Pro: A More Robust and Challenging Multi-Task Language Understanding Benchmark Paper • 2406.01574 • Published Jun 3 • 42 • 3
LooseControl: Lifting ControlNet for Generalized Depth Conditioning Paper • 2312.03079 • Published Dec 5, 2023 • 12 • 2
AutoStory: Generating Diverse Storytelling Images with Minimal Human Effort Paper • 2311.11243 • Published Nov 19, 2023 • 14 • 2
GRIM: GRaph-based Interactive narrative visualization for gaMes Paper • 2311.09213 • Published Nov 15, 2023 • 12 • 1