Submitted by akhaliq 86 Loopy: Taming Audio-Driven Portrait Avatar with Long-Term Motion Dependency · 6 authors 12
Submitted by Xidong 54 LongLLaVA: Scaling Multi-modal LLMs to 1000 Images Efficiently via Hybrid Architecture · 5 authors 2
Submitted by NeoZ123 44 LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA · 11 authors 3
Submitted by akhaliq 28 MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark · 14 authors 3
Submitted by akhaliq 17 Arctic-SnowCoder: Demystifying High-Quality Data in Code Pretraining · 3 authors 2
Submitted by akhaliq 9 FastVoiceGrad: One-step Diffusion-Based Voice Conversion with Adversarial Conditional Diffusion Distillation · 4 authors 2
Submitted by davanstrien 8 Political DEBATE: Efficient Zero-shot and Few-shot Classifiers for Political Text · 4 authors 3