HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

A La Carte Embedding: Cheap but Effective Induction of Semantic Feature Vectors

Mikhail Khodak; Nikunj Saunshi; Yingyu Liang; Tengyu Ma; Brandon Stewart; Sanjeev Arora

A La Carte Embedding: Cheap but Effective Induction of Semantic Feature Vectors

Abstract

Motivations like domain adaptation, transfer learning, and feature learning have fueled interest in inducing embeddings for rare or unseen words, n-grams, synsets, and other textual features. This paper introduces a la carte embedding, a simple and general alternative to the usual word2vec-based approaches for building such representations that is based upon recent theoretical results for GloVe-like embeddings. Our method relies mainly on a linear transformation that is efficiently learnable using pretrained word vectors and linear regression. This transform is applicable on the fly in the future when a new text feature or rare word is encountered, even if only a single usage example is available. We introduce a new dataset showing how the a la carte method requires fewer examples of words in context to learn high-quality embeddings and we obtain state-of-the-art results on a nonce task and some unsupervised document classification tasks.

Code Repositories

NLPrinceton/ALaCarte
Official
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
sentiment-analysis-on-crbyte mLSTM7
Accuracy: 90.6
sentiment-analysis-on-mpqabyte mLSTM7
Accuracy: 88.8
sentiment-analysis-on-mrbyte mLSTM7
Accuracy: 86.8
sentiment-analysis-on-sst-2-binarybyte mLSTM7
Accuracy: 91.7
sentiment-analysis-on-sst-5-fine-grainedbyte mLSTM7
Accuracy: 54.6
subjectivity-analysis-on-subjbyte mLSTM7
Accuracy: 94.7
text-classification-on-trec-6byte mLSTM7
Error: 9.6

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp