Zero-shot Model-based Reinforcement Learning using Large Language Models Paper • 2410.11711 • Published 7 days ago • 6
CompassJudger-1: All-in-one Judge Model Helps Model Evaluation and Evolution Paper • 2410.16256 • Published about 24 hours ago • 51
Pangea Collection A Fully Open Multilingual Multimodal LLM for 39 Languages • 8 items • Updated about 14 hours ago • 4
Pangea: A Fully Open Multilingual Multimodal LLM for 39 Languages Paper • 2410.16153 • Published 1 day ago • 25
AutoTrain: No-code training for state-of-the-art models Paper • 2410.15735 • Published 1 day ago • 38
WorldCuisines: A Massive-Scale Benchmark for Multilingual and Multicultural Visual Question Answering on Global Cuisines Paper • 2410.12705 • Published 6 days ago • 24
Mini-Omni2: Towards Open-source GPT-4o with Vision, Speech and Duplex Capabilities Paper • 2410.11190 • Published 8 days ago • 17
Can MLLMs Understand the Deep Implication Behind Chinese Images? Paper • 2410.13854 • Published 5 days ago • 7
Janus: Decoupling Visual Encoding for Unified Multimodal Understanding and Generation Paper • 2410.13848 • Published 5 days ago • 27
VidEgoThink: Assessing Egocentric Video Understanding Capabilities for Embodied AI Paper • 2410.11623 • Published 7 days ago • 45
MEGA-Bench: Scaling Multimodal Evaluation to over 500 Real-World Tasks Paper • 2410.10563 • Published 8 days ago • 34
LOKI: A Comprehensive Synthetic Data Detection Benchmark using Large Multimodal Models Paper • 2410.09732 • Published 10 days ago • 53
Ovis1.6 Collection With just 10B parameters, Ovis1.6-Gemma2-9B leads the OpenCompass benchmark among open-source MLLMs within 30B parameters. • 2 items • Updated 6 days ago • 2
F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching Paper • 2410.06885 • Published 13 days ago • 33
Meissonic: Revitalizing Masked Generative Transformers for Efficient High-Resolution Text-to-Image Synthesis Paper • 2410.08261 • Published 12 days ago • 45
Aria: An Open Multimodal Native Mixture-of-Experts Model Paper • 2410.05993 • Published 14 days ago • 104
MathCoder2: Better Math Reasoning from Continued Pretraining on Model-translated Mathematical Code Paper • 2410.08196 • Published 12 days ago • 44
General Preference Modeling with Preference Representations for Aligning Language Models Paper • 2410.02197 • Published 20 days ago • 6
Addition is All You Need for Energy-efficient Language Models Paper • 2410.00907 • Published 21 days ago • 138
view article Article Does Daily Software Engineering Work Need Reasoning Models? By onekq • 29 days ago • 5
OmniBench: Towards The Future of Universal Omni-Language Models Paper • 2409.15272 • Published 29 days ago • 25
Programming Every Example: Lifting Pre-training Data Quality like Experts at Scale Paper • 2409.17115 • Published 27 days ago • 59
HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models Paper • 2409.16191 • Published 28 days ago • 41
ViewCrafter: Taming Video Diffusion Models for High-fidelity Novel View Synthesis Paper • 2409.02048 • Published Sep 3 • 1
Oryx Collection Oryx: One Multi-Modal LLM for On-Demand Spatial-Temporal Understanding • 7 items • Updated about 2 hours ago • 11
InfiMM-WebMath-40B: Advancing Multimodal Pre-Training for Enhanced Mathematical Reasoning Paper • 2409.12568 • Published Sep 19 • 46
jina-embeddings-v3 Collection Multilingual multi-task general text embedding model • 6 items • Updated Sep 19 • 14
Qwen2-VL: Enhancing Vision-Language Model's Perception of the World at Any Resolution Paper • 2409.12191 • Published Sep 18 • 72
Moshi v0.1 Release Collection MLX, Candle & PyTorch model checkpoints released as part of the Moshi release from Kyutai. Run inference via: https://github.com/kyutai-labs/moshi • 13 items • Updated Sep 18 • 211
Qwen2.5 Collection Qwen2.5 language models, including pretrained and instruction-tuned models of 7 sizes, including 0.5B, 1.5B, 3B, 7B, 14B, 32B, and 72B. • 45 items • Updated Sep 18 • 268
Eagle: Exploring The Design Space for Multimodal LLMs with Mixture of Encoders Paper • 2408.15998 • Published Aug 28 • 83
DSBench: How Far Are Data Science Agents to Becoming Data Science Experts? Paper • 2409.07703 • Published Sep 12 • 66
Seed-Music: A Unified Framework for High Quality and Controlled Music Generation Paper • 2409.09214 • Published Sep 13 • 45
Robust Dual Gaussian Splatting for Immersive Human-centric Volumetric Videos Paper • 2409.08353 • Published Sep 12 • 10
General OCR Theory: Towards OCR-2.0 via a Unified End-to-end Model Paper • 2409.01704 • Published Sep 3 • 80
IFAdapter: Instance Feature Control for Grounded Text-to-Image Generation Paper • 2409.08240 • Published Sep 12 • 15
Insights from Benchmarking Frontier Language Models on Web App Code Generation Paper • 2409.05177 • Published Sep 8 • 5
view article Article All LLMs Write Great Code, But Some Make (A Lot) Fewer Mistakes By onekq • Sep 12 • 4