HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Language Model Pre-Training with Sparse Latent Typing

Liliang Ren Zixuan Zhang Han Wang Clare R. Voss Chengxiang Zhai Heng Ji

Language Model Pre-Training with Sparse Latent Typing

Abstract

Modern large-scale Pre-trained Language Models (PLMs) have achieved tremendous success on a wide range of downstream tasks. However, most of the LM pre-training objectives only focus on text reconstruction, but have not sought to learn latent-level interpretable representations of sentences. In this paper, we manage to push the language models to obtain a deeper understanding of sentences by proposing a new pre-training objective, Sparse Latent Typing, which enables the model to sparsely extract sentence-level keywords with diverse latent types. Experimental results show that our model is able to learn interpretable latent type categories in a self-supervised manner without using any external knowledge. Besides, the language model pre-trained with such an objective also significantly improves Information Extraction related downstream tasks in both supervised and few-shot settings. Our code is publicly available at: https://github.com/renll/SparseLT.

Code Repositories

renll/sparselt
Official
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
few-shot-ner-on-few-nerd-interBERT-SparseLT + CONTaiNER
10 way 1~2 shot: 52.75
10 way 5~10 shot: 62.43
5 way 1~2 shot: 57.14
5 way 5~10 shot: 66.17
few-shot-ner-on-few-nerd-intraBERT-SparseLT+CONTainNER
10 way 1~2 shot: 40.48
10 way 5~10 shot: 53.04
5 way 1~2 shot: 47.20
5 way 5~10 shot: 59.67

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp