HyperAIHyperAI

Visual Question Answering Vqa On Core Mm

Metrics

Abductive
Analogical
Deductive
Overall score
Params

Results

Performance results of various models on this benchmark

Model Name
Abductive
Analogical
Deductive
Overall score
Params
Paper TitleRepository
MiniGPT-v213.285.6911.0210.438BMiniGPT-4: Enhancing Vision-Language Understanding with Advanced Large Language Models-
BLIP-2-OPT2.7B18.967.52.7619.313BBLIP-2: Bootstrapping Language-Image Pre-training with Frozen Image Encoders and Large Language Models-
GPT-4V77.8869.8674.8674.44-GPT-4 Technical Report-
SPHINX v249.8520.6942.1739.4816BSPHINX: The Joint Mixing of Weights, Tasks, and Visual Embeddings for Multi-modal Large Language Models-
InstructBLIP37.7620.5627.5628.028BInstructBLIP: Towards General-purpose Vision-Language Models with Instruction Tuning-
Emu36.5718.1928.928.2414BEmu: Generative Pretraining in Multimodality-
Otter33.6413.3322.4922.697BOtter: A Multi-Modal Model with In-Context Instruction Tuning-
CogVLM-Chat47.8828.7536.7537.1617BCogVLM: Visual Expert for Pretrained Language Models-
mPLUG-Owl220.67.6423.4320.057BmPLUG-Owl2: Revolutionizing Multi-modal Large Language Model with Modality Collaboration-
OpenFlamingo-v25.31.118.886.829BOpenFlamingo: An Open-Source Framework for Training Large Autoregressive Vision-Language Models-
LLaVA-1.547.9124.3130.9432.6213BImproved Baselines with Visual Instruction Tuning-
Qwen-VL-Chat44.3930.4237.5537.3916BQwen-VL: A Versatile Vision-Language Model for Understanding, Localization, Text Reading, and Beyond-
LLaMA-Adapter V2 46.1222.0828.730.467BLLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model-
InternLM-XComposer-VL35.9718.6126.7726.849BInternLM-XComposer: A Vision-Language Large Model for Advanced Text-image Comprehension and Composition-
0 of 14 row(s) selected.
Visual Question Answering Vqa On Core Mm | SOTA | HyperAI