Boosting Healthcare LLMs Through Retrieved Context Paper • 2409.15127 • Published 29 days ago • 19
Eliminating Oversaturation and Artifacts of High Guidance Scales in Diffusion Models Paper • 2410.02416 • Published 19 days ago • 25
OmniBooth: Learning Latent Control for Image Synthesis with Multi-modal Instruction Paper • 2410.04932 • Published 15 days ago • 9
LLMs Know More Than They Show: On the Intrinsic Representation of LLM Hallucinations Paper • 2410.02707 • Published 19 days ago • 45
RevisEval: Improving LLM-as-a-Judge via Response-Adapted References Paper • 2410.05193 • Published 15 days ago • 12
DOTS: Learning to Reason Dynamically in LLMs via Optimal Reasoning Trajectories Search Paper • 2410.03864 • Published 18 days ago • 10
Inference Scaling for Long-Context Retrieval Augmented Generation Paper • 2410.04343 • Published 17 days ago • 9
LongGenBench: Long-context Generation Benchmark Paper • 2410.04199 • Published 17 days ago • 17
Everything Everywhere All at Once: LLMs can In-Context Learn Multiple Tasks in Superposition Paper • 2410.05603 • Published 15 days ago • 11
StructRAG: Boosting Knowledge Intensive Reasoning of LLMs via Inference-time Hybrid Information Structurization Paper • 2410.08815 • Published 11 days ago • 36
Rethinking Data Selection at Scale: Random Selection is Almost All You Need Paper • 2410.09335 • Published 11 days ago • 14
VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents Paper • 2410.10594 • Published 8 days ago • 21
Toward General Instruction-Following Alignment for Retrieval-Augmented Generation Paper • 2410.09584 • Published 10 days ago • 43
Animate-X: Universal Character Image Animation with Enhanced Motion Representation Paper • 2410.10306 • Published 9 days ago • 46
What Matters in Transformers? Not All Attention is Needed Paper • 2406.15786 • Published Jun 22 • 27
Improving Long-Text Alignment for Text-to-Image Diffusion Models Paper • 2410.11817 • Published 7 days ago • 13
FlatQuant: Flatness Matters for LLM Quantization Paper • 2410.09426 • Published 11 days ago • 12
Do LLMs Have Political Correctness? Analyzing Ethical Biases and Jailbreak Vulnerabilities in AI Systems Paper • 2410.13334 • Published 6 days ago • 12
MoH: Multi-Head Attention as Mixture-of-Head Attention Paper • 2410.11842 • Published 7 days ago • 19
WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines Paper • 2410.12705 • Published 6 days ago • 24
Harnessing Webpage UIs for Text-Rich Visual Understanding Paper • 2410.13824 • Published 5 days ago • 27
DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation Paper • 2410.13726 • Published 5 days ago • 8
Mini-Omni2: Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities Paper • 2410.11190 • Published 8 days ago • 17
FiTv2: Scalable and Improved Flexible Vision Transformer for Diffusion Model Paper • 2410.13925 • Published 5 days ago • 19
SeerAttention: Learning Intrinsic Sparse Attention in Your LLMs Paper • 2410.13276 • Published 6 days ago • 22
view article Article ColPali: Efficient Document Retrieval with Vision Language Models 👀 By manu • Jul 5 • 139
LongRoPE: Extending LLM Context Window Beyond 2 Million Tokens Paper • 2402.13753 • Published Feb 21 • 111
Llama 3.2 Collection This collection hosts the transformers and original repos of the Llama 3.2 and Llama Guard 3 • 11 items • Updated 27 days ago • 386
A Controlled Study on Long Context Extension and Generalization in LLMs Paper • 2409.12181 • Published Sep 18 • 43
Training Language Models to Self-Correct via Reinforcement Learning Paper • 2409.12917 • Published Sep 19 • 131
YesBut: A High-Quality Annotated Multimodal Dataset for evaluating Satire Comprehension capability of Vision-Language Models Paper • 2409.13592 • Published Sep 20 • 46
LLaVA-MoD: Making LLaVA Tiny via MoE Knowledge Distillation Paper • 2408.15881 • Published Aug 28 • 20
LongCite: Enabling LLMs to Generate Fine-grained Citations in Long-context QA Paper • 2409.02897 • Published Sep 4 • 44
MemoRAG: Moving towards Next-Gen RAG Via Memory-Inspired Knowledge Discovery Paper • 2409.05591 • Published Sep 9 • 28
LLaMA-Omni: Seamless Speech Interaction with Large Language Models Paper • 2409.06666 • Published Sep 10 • 55
GroUSE: A Benchmark to Evaluate Evaluators in Grounded Question Answering Paper • 2409.06595 • Published Sep 10 • 37
Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers Paper • 2409.04109 • Published Sep 6 • 42
Tora: Trajectory-oriented Diffusion Transformer for Video Generation Paper • 2407.21705 • Published Jul 31 • 25
Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters Paper • 2408.03314 • Published Aug 6 • 33
MMIU: Multimodal Multi-image Understanding for Evaluating Large Vision-Language Models Paper • 2408.02718 • Published Aug 5 • 60
WalledEval: A Comprehensive Safety Evaluation Toolkit for Large Language Models Paper • 2408.03837 • Published Aug 7 • 17
Transformer Explainer: Interactive Learning of Text-Generative Models Paper • 2408.04619 • Published Aug 8 • 154
VITA: Towards Open-Source Interactive Omni Multimodal LLM Paper • 2408.05211 • Published Aug 9 • 46
The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery Paper • 2408.06292 • Published Aug 12 • 115
LongVILA: Scaling Long-Context Visual Language Models for Long Videos Paper • 2408.10188 • Published Aug 19 • 51
Building and better understanding vision-language models: insights and future directions Paper • 2408.12637 • Published Aug 22 • 115
Mixture of Nested Experts: Adaptive Processing of Visual Tokens Paper • 2407.19985 • Published Jul 29 • 34
From Sparse to Dense: GPT-4 Summarization with Chain of Density Prompting Paper • 2309.04269 • Published Sep 8, 2023 • 32