Cuiunbo's picture

Cuiunbo PRO

Cuiunbo

·

AI & ML interests

Anything

Organizations

Cuiunbo's activity

upvoted a paper 7 days ago

VisRAG: Vision-based Retrieval-augmented Generation on Multi-modality Documents

Paper • 2410.10594 • Published 8 days ago • 21

upvoted a paper 3 months ago

MiniCPM-V: A GPT-4V Level MLLM on Your Phone

Paper • 2408.01800 • Published Aug 3 • 76

upvoted a collection 3 months ago

UI Agent

a collection of algorithmic agents for user interfaces/interactions and program synthesis • 159 items • Updated 4 days ago • 18

upvoted 2 papers 3 months ago

GUICourse: From General Vision Language Models to Versatile GUI Agents

Paper • 2406.11317 • Published Jun 17 • 1

LLaVA-UHD: an LMM Perceiving Any Aspect Ratio and High-Resolution Images

Paper • 2403.11703 • Published Mar 18 • 16

upvoted an article 4 months ago

Article

ColPali: Efficient Document Retrieval with Vision Language Models 👀

By

•

Jul 5

• 139

upvoted a paper 4 months ago

CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs

Paper • 2406.18521 • Published Jun 26 • 25

upvoted an article 4 months ago

Article

An Analysis of Chinese LLM Censorship and Bias with Qwen 2 Instruct

By

•

Jun 11

• 47

upvoted a paper 5 months ago

Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis

Paper • 2405.21075 • Published May 31 • 18

upvoted a collection 5 months ago

ConvLLaVA

A collection of ConvLLaVA models. • 10 items • Updated May 28 • 10

upvoted 2 papers 5 months ago

Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation

Paper • 2405.14598 • Published May 23 • 11

RoHM: Robust Human Motion Reconstruction via Diffusion

Paper • 2401.08570 • Published Jan 16 • 1

upvoted a collection 5 months ago

MiniCPM-V

17 items • Updated Aug 6 • 1

upvoted a paper 5 months ago

MultiBooth: Towards Generating All Your Concepts in an Image from Text

Paper • 2404.14239 • Published Apr 22 • 8

upvoted a collection 5 months ago

VisionLM

422 items • Updated about 12 hours ago • 26

upvoted a paper 5 months ago

Chameleon: Mixed-Modal Early-Fusion Foundation Models

Paper • 2405.09818 • Published May 16 • 125

upvoted a collection 5 months ago

Tiny Models

3 items • Updated Jun 20 • 1

upvoted a paper 5 months ago

What matters when building vision-language models?

Paper • 2405.02246 • Published May 3 • 98

upvoted a collection 6 months ago

DistilBERT release

Original DistilBERT model, checkpoints obtained from using teacher-student learning from the original BERT checkpoints. • 6 items • Updated Apr 17 • 13

upvoted an article 6 months ago

Article

Introducing Idefics2: A Powerful 8B Vision-Language Model for the community

Apr 15

• 163

upvoted 2 collections 6 months ago

MiniCPM

The MiniCPM family of LLMs and VLLMs. • 31 items • Updated about 7 hours ago • 53

Multimodal Models

Multimodal models with leading performance. • 14 items • Updated about 7 hours ago • 13