HyperAIHyperAI

Command Palette

Search for a command to run...

4 months ago

Topic Modeling in Embedding Spaces

Adji B. Dieng; Francisco J. R. Ruiz; David M. Blei

Topic Modeling in Embedding Spaces

Abstract

Topic modeling analyzes documents to learn meaningful patterns of words. However, existing topic models fail to learn interpretable topics when working with large and heavy-tailed vocabularies. To this end, we develop the Embedded Topic Model (ETM), a generative model of documents that marries traditional topic models with word embeddings. In particular, it models each word with a categorical distribution whose natural parameter is the inner product between a word embedding and an embedding of its assigned topic. To fit the ETM, we develop an efficient amortized variational inference algorithm. The ETM discovers interpretable topics even with large vocabularies that include rare words and stop words. It outperforms existing document models, such as latent Dirichlet allocation (LDA), in terms of both topic quality and predictive performance.

Code Repositories

adjidieng/ETM
Official
pytorch
Mentioned in GitHub
cran/topicmodels.etm
pytorch
Mentioned in GitHub
lffloyd/embedded-topic-model
Mentioned in GitHub
hjzzang/ETM
pytorch
Mentioned in GitHub
adjidieng/DETM
pytorch
Mentioned in GitHub
bnosac/ETM
pytorch
Mentioned in GitHub
yukisea/ETM
pytorch
Mentioned in GitHub
migrationsKB/MGKB
pytorch
Mentioned in GitHub
zll17/Neural_Topic_Models
pytorch
Mentioned in GitHub
fumiyo0607/ETM
pytorch
Mentioned in GitHub
bahareharandizade/keyetm
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
topic-models-on-20newsgroupsETM
C_v: 0.51
topic-models-on-ag-newsETM
C_v: 0.41
NPMI: 0.02

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp