HyperAI
HyperAI超神经
首页
算力平台
文档
资讯
论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
中文
HyperAI
HyperAI超神经
Toggle sidebar
全站搜索…
⌘
K
Command Palette
Search for a command to run...
首页
SOTA
文档图像分类
Document Image Classification On Rvl Cdip
Document Image Classification On Rvl Cdip
评估指标
Accuracy
Parameters
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
Accuracy
Parameters
Paper Title
Repository
EAML
97.70%
-
EAML: Ensemble Self-Attention-based Mutual Learning Network for Document Image Classification
-
Cross-Modal
97.05%
197M
Visual and Textual Deep Feature Fusion for Document Image Classification
-
DocFormerBASE
96.17%
183M
DocFormer: End-to-End Transformer for Document Understanding
LayoutLMV3Large
95.93%
368M
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
LiLT[EN-R]BASE
95.68%
-
LiLT: A Simple yet Effective Language-Independent Layout Transformer for Structured Document Understanding
LayoutLMv2LARGE
95.64%
-
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding
TILT-Large
95.52%
-
Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer
DocFormer large
95.50%
536M
DocFormer: End-to-End Transformer for Document Understanding
LayoutLMv3BASE
95.44%
133M
LayoutLMv3: Pre-training for Document AI with Unified Text and Image Masking
Donut
95.3%
-
OCR-free Document Understanding Transformer
TILT-Base
95.25%
-
Going Full-TILT Boogie on Document Understanding with Text-Image-Layout Transformer
LayoutLMv2BASE
95.25%
200M
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding
LayoutXLM
95.21%
-
LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding
StrucTexTv2 (large)
94.62%
238M
StrucTexTv2: Masked Visual-Textual Prediction for Document Image Pre-training
Pre-trained LayoutLM
94.42%
160M
LayoutLM: Pre-training of Text and Layout for Document Image Understanding
DoPTA
94.12%
85M
DoPTA: Improving Document Layout Analysis using Patch-Text Alignment
-
DocXClassifier-B
94.00%
95.4M
DocXClassifier: High Performance Explainable Deep Network for Document Image Classification
-
StrucTexTv2 (small)
93.4%
28M
StrucTexTv2: Masked Visual-Textual Prediction for Document Image Pre-training
VLCDoC
93.19%
217M
VLCDoC: Vision-Language Contrastive Pre-Training Model for Cross-Modal Document Classification
-
TransferDoc
93.18%
221M
GlobalDoc: A Cross-Modal Vision-Language Framework for Real-World Document Image Retrieval and Classification
-
0 of 31 row(s) selected.
Previous
Next
Document Image Classification On Rvl Cdip | SOTA | HyperAI超神经