HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

SimCSE: Simple Contrastive Learning of Sentence Embeddings

Tianyu Gao Xingcheng Yao Danqi Chen

SimCSE: Simple Contrastive Learning of Sentence Embeddings

Abstract

This paper presents SimCSE, a simple contrastive learning framework that greatly advances state-of-the-art sentence embeddings. We first describe an unsupervised approach, which takes an input sentence and predicts itself in a contrastive objective, with only standard dropout used as noise. This simple method works surprisingly well, performing on par with previous supervised counterparts. We find that dropout acts as minimal data augmentation, and removing it leads to a representation collapse. Then, we propose a supervised approach, which incorporates annotated pairs from natural language inference datasets into our contrastive learning framework by using "entailment" pairs as positives and "contradiction" pairs as hard negatives. We evaluate SimCSE on standard semantic textual similarity (STS) tasks, and our unsupervised and supervised models using BERT base achieve an average of 76.3% and 81.6% Spearman's correlation respectively, a 4.2% and 2.2% improvement compared to the previous best results. We also show -- both theoretically and empirically -- that the contrastive learning objective regularizes pre-trained embeddings' anisotropic space to be more uniform, and it better aligns positive pairs when supervised signals are available.

Code Repositories

shuxinyin/SimCSE-Pytorch
pytorch
Mentioned in GitHub
dltmddbs100/SimCSE
pytorch
Mentioned in GitHub
bm-k/kosimcse-skt
pytorch
Mentioned in GitHub
wakafengfan/simcse-pytorch
pytorch
Mentioned in GitHub
BM-K/KoSimCSE_SKT
pytorch
Mentioned in GitHub
princeton-nlp/SimCSE
Official
pytorch
Mentioned in GitHub
voidism/diffcse
jax
Mentioned in GitHub
liangyuxin42/SimCSE-reproduce
pytorch
Mentioned in GitHub
BM-K/KoSimCSE
pytorch
Mentioned in GitHub
bhuvanakundumani/SimCSE_unsupervised
pytorch
Mentioned in GitHub
dll-wu/is-cse
pytorch
Mentioned in GitHub
hooman650/supcl-seq
pytorch
Mentioned in GitHub
oneflow-inc/libai
Mentioned in GitHub
yaushian/msimcse
pytorch
Mentioned in GitHub
mcgill-nlp/llm2vec
pytorch
Mentioned in GitHub
jeongukjae/KR-BERT-SimCSE
tf
Mentioned in GitHub
sulcantonin/text_icalepcs23
pytorch
Mentioned in GitHub
nlpods/layerattpooler
pytorch
Mentioned in GitHub
cluebenchmark/simclue
pytorch
Mentioned in GitHub
vdogmcgee/SimCSE-Chinese-Pytorch
pytorch
Mentioned in GitHub
shuhewang1998/sim-gpt
pytorch
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
semantic-textual-similarity-on-sickSimCSE-RoBERTalarge
Spearman Correlation: 0.8195
semantic-textual-similarity-on-sts-benchmarkSimCSE-RoBERTalarge
Spearman Correlation: 0.867
semantic-textual-similarity-on-sts12SimCSE-RoBERTa-base
Spearman Correlation: 0.7016
semantic-textual-similarity-on-sts12SimCSE-RoBERTa-large
Spearman Correlation: 0.7746
semantic-textual-similarity-on-sts13SimCSE-RoBERTa-base
Spearman Correlation: 0.8136
semantic-textual-similarity-on-sts13SimCSE-RoBERTa-large
Spearman Correlation: 0.8727
semantic-textual-similarity-on-sts13SimCSE-BERT-base
Spearman Correlation: 0.8241
semantic-textual-similarity-on-sts14SimCSE-RoBERTalarge
Spearman Correlation: 0.8236
semantic-textual-similarity-on-sts15SimCSE-RoBERTalarge
Spearman Correlation: 0.8666
semantic-textual-similarity-on-sts16SimCSE-RoBERTalarge
Spearman Correlation: 0.8393

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp