Zero-shot Model-based Reinforcement Learning using Large Language Models Paper • 2410.11711 • Published 7 days ago • 6 • 3
Context is Key(NMF): Modelling Topical Information Dynamics in Chinese Diaspora Media Paper • 2410.12791 • Published 6 days ago • 4 • 3
Training Language Models on Synthetic Edit Sequences Improves Code Synthesis Paper • 2410.02749 • Published 19 days ago • 12 • 3
LLaVA-Critic: Learning to Evaluate Multimodal Models Paper • 2410.02712 • Published 19 days ago • 34 • 3
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning Paper • 2409.12568 • Published Sep 19 • 46 • 4
Insights from Benchmarking Frontier Language Models on Web App Code Generation Paper • 2409.05177 • Published Sep 8 • 5 • 3
Open Language Data Initiative: Advancing Low-Resource Machine Translation for Karakalpak Paper • 2409.04269 • Published Sep 6 • 9 • 3
Paper Copilot: A Self-Evolving and Efficient LLM System for Personalized Academic Assistance Paper • 2409.04593 • Published Sep 6 • 22 • 2
Benchmarking Chinese Knowledge Rectification in Large Language Models Paper • 2409.05806 • Published Sep 9 • 14 • 3
How Do Your Code LLMs Perform? Empowering Code Instruction Tuning with High-Quality Data Paper • 2409.03810 • Published Sep 5 • 30 • 6
Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming Paper • 2408.16725 • Published Aug 29 • 52 • 6
CoRe: Context-Regularized Text Embedding Learning for Text-to-Image Personalization Paper • 2408.15914 • Published Aug 28 • 21 • 6
CSGO: Content-Style Composition in Text-to-Image Generation Paper • 2408.16766 • Published Aug 29 • 17 • 7
CogVLM2: Visual Language Models for Image and Video Understanding Paper • 2408.16500 • Published Aug 29 • 56 • 5
Auxiliary-Loss-Free Load Balancing Strategy for Mixture-of-Experts Paper • 2408.15664 • Published Aug 28 • 11 • 3
Writing in the Margins: Better Inference Pattern for Long Context Retrieval Paper • 2408.14906 • Published Aug 27 • 138 • 11
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer Paper • 2408.06072 • Published Aug 12 • 35 • 6
SwiftBrush v2: Make Your One-step Diffusion Model Better Than Its Teacher Paper • 2408.14176 • Published Aug 26 • 59 • 6
DeepSeek-Prover-V1.5: Harnessing Proof Assistant Feedback for Reinforcement Learning and Monte-Carlo Tree Search Paper • 2408.08152 • Published Aug 15 • 51 • 3
VITA: Towards Open-Source Interactive Omni Multimodal LLM Paper • 2408.05211 • Published Aug 9 • 46 • 3
SeaLLMs 3: Open Foundation and Chat Multilingual Large Language Models for Southeast Asian Languages Paper • 2407.19672 • Published Jul 29 • 54 • 6
Medical SAM 2: Segment medical images as video via Segment Anything Model 2 Paper • 2408.00874 • Published Aug 1 • 41 • 7
MindSearch: Mimicking Human Minds Elicits Deep AI Searcher Paper • 2407.20183 • Published Jul 29 • 37 • 3
CoD, Towards an Interpretable Medical Agent using Chain of Diagnosis Paper • 2407.13301 • Published Jul 18 • 54 • 4
INF-LLaVA: Dual-perspective Perception for High-Resolution Multimodal Large Language Model Paper • 2407.16198 • Published Jul 23 • 13 • 3
GoldFinch: High Performance RWKV/Transformer Hybrid with Linear Pre-Fill and Extreme KV-Cache Compression Paper • 2407.12077 • Published Jul 16 • 52 • 8
EVLM: An Efficient Vision-Language Model for Visual Understanding Paper • 2407.14177 • Published Jul 19 • 42 • 5
DeepSeek-Prover: Advancing Theorem Proving in LLMs through Large-Scale Synthetic Data Paper • 2405.14333 • Published May 23 • 34 • 6
Benchmarking Trustworthiness of Multimodal Large Language Models: A Comprehensive Study Paper • 2406.07057 • Published Jun 11 • 15 • 3
Internal Consistency and Self-Feedback in Large Language Models: A Survey Paper • 2407.14507 • Published Jul 19 • 46 • 9
Qalam : A Multimodal LLM for Arabic Optical Character and Handwriting Recognition Paper • 2407.13559 • Published Jul 18 • 13 • 9
PicoAudio: Enabling Precise Timestamp and Frequency Controllability of Audio Events in Text-to-audio Generation Paper • 2407.02869 • Published Jul 3 • 18 • 5
MambaVision: A Hybrid Mamba-Transformer Vision Backbone Paper • 2407.08083 • Published Jul 10 • 27 • 5
The Synergy between Data and Multi-Modal Large Language Models: A Survey from Co-Development Perspective Paper • 2407.08583 • Published Jul 11 • 10 • 4
Skywork-Math: Data Scaling Laws for Mathematical Reasoning in Large Language Models -- The Story Goes On Paper • 2407.08348 • Published Jul 11 • 50 • 5
LLaVA-NeXT-Interleave: Tackling Multi-image, Video, and 3D in Large Multimodal Models Paper • 2407.07895 • Published Jul 10 • 40 • 3
Internet of Agents: Weaving a Web of Heterogeneous Agents for Collaborative Intelligence Paper • 2407.07061 • Published Jul 9 • 26 • 4
BM25S: Orders of magnitude faster lexical search via eager sparse scoring Paper • 2407.03618 • Published Jul 4 • 11 • 3
Learning Action and Reasoning-Centric Image Editing from Videos and Simulations Paper • 2407.03471 • Published Jul 3 • 27 • 2
MJ-Bench: Is Your Multimodal Reward Model Really a Good Judge for Text-to-Image Generation? Paper • 2407.04842 • Published Jul 5 • 52 • 5
InternLM-XComposer-2.5: A Versatile Large Vision Language Model Supporting Long-Contextual Input and Output Paper • 2407.03320 • Published Jul 3 • 92 • 5
OpenVid-1M: A Large-Scale High-Quality Dataset for Text-to-video Generation Paper • 2407.02371 • Published Jul 2 • 49 • 6
FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds Paper • 2407.01494 • Published Jul 1 • 13 • 2
HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale Paper • 2406.19280 • Published Jun 27 • 59 • 9
ArzEn-LLM: Code-Switched Egyptian Arabic-English Translation and Speech Recognition Using LLMs Paper • 2406.18120 • Published Jun 26 • 6 • 5