HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception

Zhiyuan Zhao Hengrui Kang Bin Wang Conghui He

DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse
  Synthetic Data and Global-to-Local Adaptive Perception

Abstract

Document Layout Analysis is crucial for real-world document understandingsystems, but it encounters a challenging trade-off between speed and accuracy:multimodal methods leveraging both text and visual features achieve higheraccuracy but suffer from significant latency, whereas unimodal methods relyingsolely on visual features offer faster processing speeds at the expense ofaccuracy. To address this dilemma, we introduce DocLayout-YOLO, a novelapproach that enhances accuracy while maintaining speed advantages throughdocument-specific optimizations in both pre-training and model design. Forrobust document pre-training, we introduce the Mesh-candidate BestFitalgorithm, which frames document synthesis as a two-dimensional bin packingproblem, generating the large-scale, diverse DocSynth-300K dataset.Pre-training on the resulting DocSynth-300K dataset significantly improvesfine-tuning performance across various document types. In terms of modeloptimization, we propose a Global-to-Local Controllable Receptive Module thatis capable of better handling multi-scale variations of document elements.Furthermore, to validate performance across different document types, weintroduce a complex and challenging benchmark named DocStructBench. Extensiveexperiments on downstream datasets demonstrate that DocLayout-YOLO excels inboth speed and accuracy. Code, data, and models are available athttps://github.com/opendatalab/DocLayout-YOLO.

Code Repositories

opendatalab/PDF-Extract-Kit
pytorch
Mentioned in GitHub
opendatalab/DocLayout-YOLO
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
document-layout-analysis-on-d4laDocLayout-YOLO
mAP: 70.3

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp