3 months ago

Language Model Pre-Training with Sparse Latent Typing

Liliang Ren Zixuan Zhang Han Wang Clare R. Voss Chengxiang Zhai Heng Ji

Abstract

Modern large-scale Pre-trained Language Models (PLMs) have achieved tremendous success on a wide range of downstream tasks. However, most of the LM pre-training objectives only focus on text reconstruction, but have not sought to learn latent-level interpretable representations of sentences. In this paper, we manage to push the language models to obtain a deeper understanding of sentences by proposing a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types. Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge. Besides, the language model pre-trained with such an objective also significantly improves Information Extraction related downstream tasks in both supervised and few-shot settings. Our code is publicly available at: https://github.com/renll/SparseLT.

Code Repositories

renll/sparselt

Official

pytorch

Mentioned in GitHub

Benchmarks

Benchmark	Methodology	Metrics
few-shot-ner-on-few-nerd-inter	BERT-SparseLT + CONTaiNER	10 way 1~2 shot: 52.75 10 way 5~10 shot: 62.43 5 way 1~2 shot: 57.14 5 way 5~10 shot: 66.17
few-shot-ner-on-few-nerd-intra	BERT-SparseLT+CONTainNER	10 way 1~2 shot: 40.48 10 way 5~10 shot: 53.04 5 way 1~2 shot: 47.20 5 way 5~10 shot: 59.67

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette