Command Palette
Search for a command to run...
Bridging the Performance Gap between DETR and R-CNN for Graphical Object Detection in Document Images
Shehzadi Tahira ; Hashmi Khurram Azeem ; Stricker Didier ; Liwicki Marcus ; Afzal Muhammad Zeshan

Abstract
This paper takes an important step in bridging the performance gap betweenDETR and R-CNN for graphical object detection. Existing graphical objectdetection approaches have enjoyed recent enhancements in CNN-based objectdetection methods, achieving remarkable progress. Recently, Transformer-baseddetectors have considerably boosted the generic object detection performance,eliminating the need for hand-crafted features or post-processing steps such asNon-Maximum Suppression (NMS) using object queries. However, the effectivenessof such enhanced transformer-based detection algorithms has yet to be verifiedfor the problem of graphical object detection. Essentially, inspired by thelatest advancements in the DETR, we employ the existing detection transformerwith few modifications for graphical object detection. We modify object queriesin different ways, using points, anchor boxes and adding positive and negativenoise to the anchors to boost performance. These modifications allow for betterhandling of objects with varying sizes and aspect ratios, more robustness tosmall variations in object positions and sizes, and improved imagediscrimination between objects and non-objects. We evaluate our approach on thefour graphical datasets: PubTables, TableBank, NTable and PubLaynet. Uponintegrating query modifications in the DETR, we outperform prior works andachieve new state-of-the-art results with the mAP of 96.9\%, 95.7\% and 99.3\%on TableBank, PubLaynet, PubTables, respectively. The results from extensiveablations show that transformer-based methods are more effective for documentanalysis analogous to other applications. We hope this study draws moreattention to the research of using detection transformers in document imageanalysis.
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| document-layout-analysis-on-publaynet-val | DETR | Figure: 0.975 List: 0.964 Overall: 0.957 Table: 0.981 Text: 0.947 Title: 0.918 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.