HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

SRFormer: Text Detection Transformer with Incorporated Segmentation and Regression

Qingwen Bu Sungrae Park Minsoo Khang Yichuan Cheng

SRFormer: Text Detection Transformer with Incorporated Segmentation and Regression

Abstract

Existing techniques for text detection can be broadly classified into two primary groups: segmentation-based and regression-based methods. Segmentation models offer enhanced robustness to font variations but require intricate post-processing, leading to high computational overhead. Regression-based methods undertake instance-aware prediction but face limitations in robustness and data efficiency due to their reliance on high-level representations. In our academic pursuit, we propose SRFormer, a unified DETR-based model with amalgamated Segmentation and Regression, aiming at the synergistic harnessing of the inherent robustness in segmentation representations, along with the straightforward post-processing of instance-level regression. Our empirical analysis indicates that favorable segmentation predictions can be obtained at the initial decoder layers. In light of this, we constrain the incorporation of segmentation branches to the first few decoder layers and employ progressive regression refinement in subsequent layers, achieving performance gains while minimizing computational load from the mask.Furthermore, we propose a Mask-informed Query Enhancement module. We take the segmentation result as a natural soft-ROI to pool and extract robust pixel representations, which are then employed to enhance and diversify instance queries. Extensive experimentation across multiple benchmarks has yielded compelling findings, highlighting our method's exceptional robustness, superior training and data efficiency, as well as its state-of-the-art performance. Our code is available at https://github.com/retsuh-bqw/SRFormer-Text-Det.

Code Repositories

opendrivelab/elm
pytorch
Mentioned in GitHub
retsuh-bqw/SRFormer-Text-Det
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
scene-text-detection-on-ic19-artSRFormer (ResNet-50)
H-Mean: 79.3
scene-text-detection-on-scut-ctw1500SRFormer (ResNet-50)
F-Measure: 89.6
Precision: 91.6
Recall: 87.7
scene-text-detection-on-total-textSRFormer (ResNet-50)
F-Measure: 90.0%
Precision: 92.2%
Recall: 87.9%

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp