HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

A Graphical Approach to Document Layout Analysis

Wang Jilin ; Krumdick Michael ; Tong Baojia ; Halim Hamima ; Sokolov Maxim ; Barda Vadym ; Vendryes Delphine ; Tanner Chris

A Graphical Approach to Document Layout Analysis

Abstract

Document layout analysis (DLA) is the task of detecting the distinct,semantic content within a document and correctly classifying these items intoan appropriate category (e.g., text, title, figure). DLA pipelines enable usersto convert documents into structured machine-readable formats that can then beused for many useful downstream tasks. Most existing state-of-the-art (SOTA)DLA models represent documents as images, discarding the rich metadataavailable in electronically generated PDFs. Directly leveraging this metadata,we represent each PDF page as a structured graph and frame the DLA problem as agraph segmentation and classification problem. We introduce the Graph-basedLayout Analysis Model (GLAM), a lightweight graph neural network competitivewith SOTA models on two challenging DLA datasets - while being an order ofmagnitude smaller than existing models. In particular, the 4-million parameterGLAM model outperforms the leading 140M+ parameter computer vision-based modelon 5 of the 11 classes on the DocLayNet dataset. A simple ensemble of these twomodels achieves a new state-of-the-art on DocLayNet, increasing mAP from 76.8to 80.8. Overall, GLAM is over 5 times more efficient than SOTA models, makingGLAM a favorable engineering choice for DLA tasks.

Code Repositories

ivanstepanovftw/glam
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
document-layout-analysis-on-publaynet-valGLAM
Figure: 0.206
List: 0.862
Overall: 0.722
Table: 0.868
Text: 0.878
Title: 0.800

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
A Graphical Approach to Document Layout Analysis | Papers | HyperAI