HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Def2Vec: Extensible Word Embeddings from Dictionary Definitions

{Roberto Tedesco Vincenzo Scotti Irene Morazzoni}

Def2Vec: Extensible Word Embeddings from Dictionary Definitions

Abstract

Def2Vec introduces a novel paradigm for word embeddings, leveraging dictionary definitions to learn semantic representations. By constructing term-document matrices from definitions and applying Latent Semantic Analysis (LSA), Def2Vec generates embeddings that offer both strong performance and extensibility. In evaluations encompassing Part-of-Speech tagging, Named Entity Recognition, chunking, and semantic similarity, Def2Vec often matches or surpasses state-of-the-art models like Word2Vec, GloVe, and fastText. Our model’s second factorised matrix resulting from LSA enables efficient embedding extension for out-of-vocabulary words. By effectively reconciling the advantages of dictionary definitions with LSA-based embeddings, Def2Vec yields informative semantic representations, especially considering its reduced data requirements. This paper advances the understanding of word embedding generation by incorporating structured lexical information and efficient embedding extension.

Benchmarks

BenchmarkMethodologyMetrics
chunking-on-conll-2003Def2Vec
AUC: 93.07
Accuracy: 77.69
F1: 81.45
Precision: 86.56
Recall: 77.69
ner-on-conll-2003-1Def2Vec
AUC: 96.28
Accuracy: 71.98
F1: 83.09
Precision: 99.28
Recall: 71.98
semantic-textual-similarity-on-sts-benchmarkDef2Vec
Spearman Correlation: 0.6372

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp