Submitted by akhaliq 26 Video-LLaVA: Learning United Visual Representation by Alignment Before Projection · 6 authors 1
Submitted by akhaliq 24 Emu Video: Factorizing Text-to-Video Generation by Explicit Image Conditioning · 10 authors 3
Submitted by akhaliq 23 Rethinking Attention: Exploring Shallow Feed-Forward Neural Networks as an Alternative to Attention Layers in Transformers · 4 authors 1
Submitted by akhaliq 18 Camels in a Changing Climate: Enhancing LM Adaptation with Tulu 2 · 11 authors 5
Submitted by akhaliq 15 MetaDreamer: Efficient Text-to-3D Creation With Disentangling Geometry and Texture · 5 authors 1
Submitted by akhaliq 14 SelfEval: Leveraging the discriminative nature of generative models for evaluation · 2 authors
Submitted by akhaliq 7 I&S-ViT: An Inclusive & Stable Method for Pushing the Limit of Post-Training ViTs Quantization · 5 authors
Submitted by akhaliq 5 Distilling and Retrieving Generalizable Knowledge for Robot Manipulation via Language Corrections · 8 authors
Submitted by akhaliq 4 UnifiedVisionGPT: Streamlining Vision-Oriented AI through Generalized Multimodal Framework · 9 authors