Submitted by akhaliq 50 MathVerse: Does Your Multi-modal LLM Truly See the Diagrams in Visual Math Problems? · 11 authors 3
Submitted by akhaliq 32 Cobra: Extending Mamba to Multi-Modal Large Language Model for Efficient Inference · 6 authors 2
Submitted by akhaliq 21 AnyV2V: A Plug-and-Play Framework For Any Video-to-Video Editing Tasks · 5 authors 1
Submitted by akhaliq 17 Efficient Video Diffusion Models via Content-Frame Motion-Latent Decomposition · 6 authors 1
Submitted by akhaliq 14 GRM: Large Gaussian Reconstruction Model for Efficient 3D Reconstruction and Generation · 8 authors 2
Submitted by akhaliq 12 Gaussian Frosting: Editable Complex Radiance Fields with Real-Time Rendering · 2 authors 1
Submitted by akhaliq 8 StyleCineGAN: Landscape Cinemagraph Generation using a Pre-trained StyleGAN · 4 authors 1
Submitted by akhaliq 6 Recourse for reclamation: Chatting with generative language models · 4 authors 1