DinoBloom: A Foundation Model for Generalizable Cell Embeddings in Hematology Paper • 2404.05022 • Published Apr 7 • 2
TurtleBench: Evaluating Top Language Models via Real-World Yes/No Puzzles Paper • 2410.05262 • Published 15 days ago • 9
VideoGuide: Improving Video Diffusion Models without Training Through a Teacher's Guide Paper • 2410.04364 • Published 17 days ago • 26
TLDR: Token-Level Detective Reward Model for Large Vision Language Models Paper • 2410.04734 • Published 16 days ago • 15
Grounding Language in Multi-Perspective Referential Communication Paper • 2410.03959 • Published 18 days ago • 3
Presto! Distilling Steps and Layers for Accelerating Music Generation Paper • 2410.05167 • Published 15 days ago • 15
Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning Paper • 2410.00255 • Published 22 days ago • 5
SciPrompt: Knowledge-augmented Prompting for Fine-grained Categorization of Scientific Topics Paper • 2410.01946 • Published 20 days ago • 4
Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data Paper • 2410.02056 • Published 20 days ago • 4
Layer Swapping for Zero-Shot Cross-Lingual Transfer in Large Language Models Paper • 2410.01335 • Published 21 days ago • 5
Learning the Latent Rules of a Game from Data: A Chess Story Paper • 2410.02426 • Published 19 days ago • 5
Improving Autonomous AI Agents with Reflective Tree Search and Self-Learning Paper • 2410.02052 • Published 20 days ago • 8
MedVisionLlama: Leveraging Pre-Trained Large Language Model Layers to Enhance Medical Image Segmentation Paper • 2410.02458 • Published 19 days ago • 9
Vinoground: Scrutinizing LMMs over Dense Temporal Reasoning with Short Videos Paper • 2410.02763 • Published 19 days ago • 7
L-CiteEval: Do Long-Context Models Truly Leverage Context for Responding? Paper • 2410.02115 • Published 20 days ago • 10
Interpreting and Editing Vision-Language Representations to Mitigate Hallucinations Paper • 2410.02762 • Published 19 days ago • 9
MVGS: Multi-view-regulated Gaussian Splatting for Novel View Synthesis Paper • 2410.02103 • Published 20 days ago • 8
SageAttention: Accurate 8-Bit Attention for Plug-and-play Inference Acceleration Paper • 2410.02367 • Published 19 days ago • 45
CLIP-MoE: Towards Building Mixture of Experts for CLIP with Diversified Multiplet Upcycling Paper • 2409.19291 • Published 24 days ago • 18
Training Language Models on Synthetic Edit Sequences Improves Code Synthesis Paper • 2410.02749 • Published 19 days ago • 12
Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models Paper • 2410.02416 • Published 19 days ago • 25
Distilling an End-to-End Voice Assistant Without Instruction Training Data Paper • 2410.02678 • Published 19 days ago • 21
VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment Paper • 2410.01679 • Published 20 days ago • 22
Depth Pro: Sharp Monocular Metric Depth in Less Than a Second Paper • 2410.02073 • Published 20 days ago • 38
Loong: Generating Minute-level Long Videos with Autoregressive Language Models Paper • 2410.02757 • Published 19 days ago • 36
Revisit Large-Scale Image-Caption Data in Pre-training Multimodal Foundation Models Paper • 2410.02740 • Published 19 days ago • 52
Closed-loop Long-horizon Robotic Planning via Equilibrium Sequence Modeling Paper • 2410.01440 • Published 20 days ago • 3
EVER: Exact Volumetric Ellipsoid Rendering for Real-time View Synthesis Paper • 2410.01804 • Published 20 days ago • 5
HelpSteer2-Preference: Complementing Ratings with Preferences Paper • 2410.01257 • Published 21 days ago • 16
Quantifying Generalization Complexity for Large Language Models Paper • 2410.01769 • Published 20 days ago • 13
ComfyGen: Prompt-Adaptive Workflows for Text-to-Image Generation Paper • 2410.01731 • Published 20 days ago • 15
LEOPARD : A Vision Language Model For Text-Rich Multi-Image Tasks Paper • 2410.01744 • Published 20 days ago • 23
From Code to Correctness: Closing the Last Mile of Code Generation with Hierarchical Debugging Paper • 2410.01215 • Published 21 days ago • 30
RATIONALYST: Pre-training Process-Supervision for Improving Reasoning Paper • 2410.01044 • Published 21 days ago • 34
IDEAW: Robust Neural Audio Watermarking with Invertible Dual-Embedding Paper • 2409.19627 • Published 23 days ago • 1
Coffee-Gym: An Environment for Evaluating and Improving Natural Language Feedback on Erroneous Code Paper • 2409.19715 • Published 23 days ago • 8
Cottention: Linear Transformers With Cosine Attention Paper • 2409.18747 • Published 25 days ago • 15
Scaling Proprioceptive-Visual Learning with Heterogeneous Pre-trained Transformers Paper • 2409.20537 • Published 22 days ago • 11
UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models Paper • 2409.20551 • Published 22 days ago • 13
Ruler: A Model-Agnostic Method to Control Generated Length for Large Language Models Paper • 2409.18943 • Published 25 days ago • 26