HyperAI超神经

Visual Question Answering On Docvqa Test

评估指标

ANLS

评测结果

各个模型在此基准测试上的表现结果

比较表格
模型名称ANLS
matcha-enhancing-visual-language-pretraining0.742
layout-and-task-aware-instruction-prompt-for0.884
pali-3-vision-language-models-smaller-faster0.876
qwen-vl-a-frontier-large-vision-language0.651
ernie-layout-layout-knowledge-enhanced-pre0.8486
dublin-document-understanding-by-language0.782
pix2struct-screenshot-parsing-as-pretraining0.721
dublin-document-understanding-by-language0.803
pali-3-vision-language-models-smaller-faster0.886
qwen-vl-a-frontier-large-vision-language0.9024
pali-x-on-scaling-up-a-multilingual-vision0.868
pali-x-on-scaling-up-a-multilingual-vision0.80
layout-and-task-aware-instruction-prompt-for0.8336
going-full-tilt-boogie-on-document0.8705
docvqa-a-dataset-for-vqa-on-document-images0.665
docformerv2-local-features-for-document0.8784
multi-label-cluster-discrimination-for-visual0.916
unifying-vision-text-and-layout-for-universal0.878
unifying-vision-text-and-layout-for-universal0.847
omni-smola-boosting-generalist-multimodal0.906
ernie-layout-layout-knowledge-enhanced-pre0.8841
going-full-tilt-boogie-on-document0.8392
layoutlmv2-multi-modal-pre-training-for0.8672
omni-smola-boosting-generalist-multimodal0.908
donut-document-understanding-transformer0.675
end-to-end-document-recognition-and0.632
docvqa-a-dataset-for-vqa-on-document-images0.9436
qwen-vl-a-frontier-large-vision-language0.626
layoutlmv2-multi-modal-pre-training-for0.7808
pali-x-on-scaling-up-a-multilingual-vision0.809
layout-and-task-aware-instruction-prompt-for0.8255
screenai-a-vision-language-model-for-ui-and0.8988
pix2struct-screenshot-parsing-as-pretraining0.766