Submitted by akhaliq 30 TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones · 3 authors 5
Submitted by akhaliq 26 Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action · 8 authors 2
Submitted by akhaliq 25 Generative AI for Math: Part I -- MathPile: A Billion-Token-Scale Pretraining Corpus for Math · 3 authors 11
Submitted by akhaliq 19 MobileVLM : A Fast, Reproducible and Strong Vision Language Assistant for Mobile Devices · 11 authors 2
Submitted by akhaliq 15 DL3DV-10K: A Large-Scale Scene Dataset for Deep Learning-based 3D Vision · 20 authors 4
Submitted by akhaliq 13 City-on-Web: Real-time Neural Rendering of Large-scale Scenes on the Web · 2 authors 1
Submitted by akhaliq 13 I2V-Adapter: A General Image-to-Video Adapter for Video Diffusion Models · 11 authors 1
Submitted by akhaliq 9 Spacetime Gaussian Feature Splatting for Real-Time Dynamic View Synthesis · 4 authors 2
Submitted by akhaliq 7 Hyper-VolTran: Fast and Generalizable One-Shot Image to 3D Object Structure via HyperNetworks · 6 authors 1
Submitted by akhaliq 6 SSR-Encoder: Encoding Selective Subject Representation for Subject-Driven Generation · 11 authors 1
Submitted by akhaliq 6 PanGu-Draw: Advancing Resource-Efficient Text-to-Image Synthesis with Time-Decoupled Training and Reusable Coop-Diffusion · 10 authors 1
Submitted by akhaliq 5 DiffusionGAN3D: Boosting Text-guided 3D Generation and Domain Adaption by Combining 3D GANs and Diffusion Priors · 5 authors 1