Command Palette
Search for a command to run...
All-but-the-Top: Simple and Effective Postprocessing for Word Representations
Jiaqi Mu; Suma Bhat; Pramod Viswanath

Abstract
Real-valued word representations have transformed NLP applications; popular examples are word2vec and GloVe, recognized for their ability to capture linguistic regularities. In this paper, we demonstrate a {\em very simple}, and yet counter-intuitive, postprocessing technique -- eliminate the common mean vector and a few top dominating directions from the word vectors -- that renders off-the-shelf representations {\em even stronger}. The postprocessing is empirically validated on a variety of lexical-level intrinsic tasks (word similarity, concept categorization, word analogy) and sentence-level tasks (semantic textural similarity and { text classification}) on multiple datasets and with a variety of representation methods and hyperparameter choices in multiple languages; in each case, the processed representations are consistently better than the original ones.
Code Repositories
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| sentiment-analysis-on-mr | GRU-RNN-WORD2VEC | Accuracy: 78.26 |
| sentiment-analysis-on-sst-5-fine-grained | GRU-RNN-WORD2VEC | Accuracy: 45.02 |
| subjectivity-analysis-on-subj | GRU-RNN-GLOVE | Accuracy: 91.85 |
| text-classification-on-trec-6 | GRU-RNN-GLOVE | Error: 7.0 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.