HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Look Back Again: Dual Parallel Attention Network for Accurate and Robust Scene Text Recognition

{Junbo Guo Hongtao Xie Guoqing Jin Zilong Fu}

Abstract

Nowadays, it is a trend that using a parallel-decoupled encoderdecoder (PDED) framework in scene text recognition for its flexibility and efficiency. However, due to the inconsistent information content between queries and keys in the parallel positional attention module (PPAM) used in this kind of framework(queries: position information, keys: context and position information), visual misalignment tends to appear when confronting hard samples(e.g., blurred texts, irregular texts, or low-quality images). To tackle this issue, in this paper, we propose a dual parallel attention network (DPAN), in which a newly designed parallel context attention module (PCAM) is cascaded with the original PPAM, using linguistic contextual information to compensate for the information inconsistency between queries and keys. Specifically, in PCAM, we take the visual features from PPAM as inputs and present a bidirectional language model to enhance them with linguistic contexts to produce queries. In this way, we make the information content of the queries and keys consistent in PCAM, which helps to generate more precise visual glimpses to improve the entire PDED framework’s accuracy and robustness. Experimental results verify the effectiveness of the proposed PCAM, showing the necessity of keeping the information consistency between queries and keys in the attention mechanism. On six benchmarks, including regular text and irregular text, the performance of DPAN surpasses the existing leading methods by large margins, achieving new state-of-the-art performance. The code is available on https://github.com/Jackandrome/DPAN.

Benchmarks

BenchmarkMethodologyMetrics
scene-text-recognition-on-cute80DPAN
Accuracy: 91.9
scene-text-recognition-on-icdar2013DPAN
Accuracy: 97.7
scene-text-recognition-on-icdar2015DPAN
Accuracy: 85.5
scene-text-recognition-on-iiit5kDPAN
Accuracy: 96.2
scene-text-recognition-on-svtDPAN
Accuracy: 93.9
scene-text-recognition-on-svtpDPAN
Accuracy: 89.0

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp