YesBut: A High-Quality Annotated Multimodal Dataset for evaluating Satire Comprehension capability of Vision-Language Models Paper • 2409.13592 • Published Sep 20 • 46 • 9
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence Paper • 2406.11931 • Published Jun 17 • 57 • 3
Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models Paper • 2406.11831 • Published Jun 17 • 19 • 4
PowerInfer-2: Fast Large Language Model Inference on a Smartphone Paper • 2406.06282 • Published Jun 10 • 36 • 5
Skywork-MoE: A Deep Dive into Training Techniques for Mixture-of-Experts Language Models Paper • 2406.06563 • Published Jun 3 • 17 • 10
Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model Paper • 2404.04167 • Published Apr 5 • 12 • 2
Sigma: Siamese Mamba Network for Multi-Modal Semantic Segmentation Paper • 2404.04256 • Published Apr 5 • 5 • 1
EE-LLM: Large-Scale Training and Inference of Early-Exit Large Language Models with 3D Parallelism Paper • 2312.04916 • Published Dec 8, 2023 • 6 • 7
LLaMA Beyond English: An Empirical Study on Language Capability Transfer Paper • 2401.01055 • Published Jan 2 • 53 • 4
Gemini: A Family of Highly Capable Multimodal Models Paper • 2312.11805 • Published Dec 19, 2023 • 45 • 10
A Challenger to GPT-4V? Early Explorations of Gemini in Visual Expertise Paper • 2312.12436 • Published Dec 19, 2023 • 13 • 3
LLM360: Towards Fully Transparent Open-Source LLMs Paper • 2312.06550 • Published Dec 11, 2023 • 56 • 4
VideoSwap: Customized Video Subject Swapping with Interactive Semantic Point Correspondence Paper • 2312.02087 • Published Dec 4, 2023 • 20 • 5
QMoE: Practical Sub-1-Bit Compression of Trillion-Parameter Models Paper • 2310.16795 • Published Oct 25, 2023 • 26 • 3
BitNet: Scaling 1-bit Transformers for Large Language Models Paper • 2310.11453 • Published Oct 17, 2023 • 96 • 13