HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

Simplify the Usage of Lexicon in Chinese NER

Ruotian Ma; Minlong Peng; Qi Zhang; Xuanjing Huang

Simplify the Usage of Lexicon in Chinese NER

Abstract

Recently, many works have tried to augment the performance of Chinese named entity recognition (NER) using word lexicons. As a representative, Lattice-LSTM (Zhang and Yang, 2018) has achieved new benchmark results on several public Chinese NER datasets. However, Lattice-LSTM has a complex model architecture. This limits its application in many industrial areas where real-time NER responses are needed. In this work, we propose a simple but effective method for incorporating the word lexicon into the character representations. This method avoids designing a complicated sequence modeling architecture, and for any neural NER model, it requires only subtle adjustment of the character representation layer to introduce the lexicon information. Experimental studies on four benchmark Chinese NER datasets show that our method achieves an inference speed up to 6.15 times faster than those of state-ofthe-art methods, along with a better performance. The experimental results also show that the proposed method can be easily incorporated with pre-trained models like BERT.

Code Repositories

v-mipeng/LexiconAugmentedNER
Official
pytorch
Mentioned in GitHub
changle2018/LexionAN-master
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
chinese-named-entity-recognition-on-msraLSTM + Lexicon augment
F1: 93.5
chinese-named-entity-recognition-on-ontonotesLSTM + Lexicon augment
F1: 75.54
chinese-named-entity-recognition-on-resumeLSTM + Lexicon augment
F1: 95.59
chinese-named-entity-recognition-on-weibo-nerLSTM + Lexicon augment
F1: 61.24

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp