MVBench_Leaderboard / result.csv
yinanhe
[Init]
791df9f
raw
history blame
2.63 kB
Type,Model,Language Model,Avg,Action Antonym,Action Count,Action Localization,Action Prediction,Action Sequence,Character Count,Counterfactual Inference,Egocentric Navigation,Episodic Reasoning,Fine grained Action,Fine grained Pose,Moving Attribute,Moving Count,Moving Direction,Object Existence,Object Interaction,Object Shuffle,Scene Transition,State Change,Unexpected Action
LLM,Random,NOLLM,28.0,33.3,33.3,25.0,25.0,25.0,33.3,30.9,25.0,20.0,25.0,25.0,33.3,25.0,25.0,33.3,25.0,33.3,25.0,33.3,25.0
ImageLLM,mPLUG-Owl-I,LLaMA-7B,29.4,44.5,34.5,24.0,20.0,25.0,37.0,37.0,25.5,21.0,27.0,24.0,31.5,22.0,23.0,36.0,24.0,34.0,34.5,40.0,23.5
ImageLLM,LLaMA-Adapter,LLaMA-7B,31.7,51.0,29.0,21.5,28.0,23.0,31.5,32.0,22.5,28.0,30.0,25.0,41.5,22.5,25.5,53.5,32.5,33.5,30.5,39.5,33.0
ImageLLM,BLIP2,FlanT5-XL,31.4,33.5,25.5,26.0,29.0,24.5,30.0,31.0,26.0,37.0,17.0,27.0,40.0,30.0,25.5,51.5,26.0,31.0,32.5,42.0,42.0
ImageLLM,Otter-I,MPT-7B,33.5,39.5,20.0,25.5,32.0,34.5,27.0,36.5,32.0,29.0,30.5,28.0,28.5,32.5,19.0,48.5,44.0,29.5,55.0,39.0,38.5
ImageLLM,MiniGPT-4,Vicuna-7B,18.8,26.0,32.5,12.0,18.0,16.0,29.5,3.0,19.0,9.9,21.5,26.0,8.0,15.5,11.5,29.5,25.5,13.0,9.5,34.0,16.0
ImageLLM,InstructBLIP,Vicuna-7B,32.5,46.0,42.5,23.0,16.5,20.0,30.0,38.0,25.5,30.5,24.5,25.5,40.5,26.5,22.0,51.0,26.0,37.5,46.5,32.0,46.0
ImageLLM,LLaVA,Vicuna-7B,36.0,63.0,34.0,20.5,39.5,28.0,36.0,42.0,27.0,26.5,30.5,25.0,38.5,20.5,23.0,53.0,41.0,41.5,45.0,47.0,39.0
VideoLLM,Otter-V,LLaMA-7B,26.8,27.5,26.0,23.5,23.0,23.0,22.0,19.5,23.5,19.0,27.0,22.0,18.0,28.5,24.5,53.0,28.0,33.0,27.5,38.5,29.5
VideoLLM,mPLUG-Owl-V,LLaMA-7B,29.7,34.0,31.5,23.0,28.0,22.0,31.0,29.5,26.0,20.5,29.0,24.0,40.0,27.0,27.0,40.5,27.0,31.5,29.0,44.0,29.0
VideoLLM,VideoChatGPT,Vicuna-7B,32.7,62.0,30.5,20.0,26.0,23.5,33.0,35.5,29.5,26.0,22.5,29.0,39.5,25.5,23.0,54.0,28.0,40.0,31.0,48.5,26.5
VideoLLM,VideoLLaMA,Vicuna-7B,34.1,51.0,34.0,22.5,25.5,27.5,40.0,37.0,30.0,21.0,29.0,32.5,32.5,22.5,22.5,48.0,40.5,38.0,43.0,45.5,39.0
VideoLLM,VideoChat,Vicuna-7B,35.5,56.0,35.0,27.0,26.5,33.5,41.0,36.0,23.5,23.5,33.5,26.5,42.5,20.5,25.5,53.0,40.5,30.0,48.5,46.0,40.5
VideoLLM,VideoChat2_text,Vicuna-7B,34.7,49.5,41.5,27.0,27.0,24.5,36.0,40.0,33.0,32.0,27.0,26.5,32.5,27.5,25.5,53.0,28.0,40.0,38.5,46.5,38.0
VideoLLM,VideoChat2,Vicuna-7B,51.1,83.5,39.0,23.0,47.5,66.0,36.5,65.5,35.0,40.5,49.5,49.0,58.5,42.0,23.0,58.0,71.5,42.5,88.5,44.0,60.0
VideoLLM,GPT-4V,GPT-4,43.7,72.0,39.0,40.5,63.5,55.5,52.0,11.0,31.0,59.0,46.5,47.5,22.5,12.0,12.0,18.5,59.0,29.5,83.5,45.0,73.5
ImageLLM,GiminiPro,Gimini,37.7,43.7,3.9,40.0,41.8,35.4,38.7,33.7,36.4,36.4,36.2,26.5,41.5,18.0,16.5,43.5,37.5,39.8,75.4,42.3,67.1