Command Palette
Search for a command to run...

摘要
文本嵌入(text embeddings)通常仅在单一任务的少量数据集上进行评估,而这些数据集无法涵盖其在其他任务中的潜在应用。目前尚不清楚在语义文本相似性(STS)任务上表现最先进的嵌入方法,是否同样适用于聚类或重排序等其他任务。这种评估局限性使得该领域的进展难以追踪,因为各类模型不断被提出,却缺乏系统性的评估验证。为解决这一问题,我们提出了大规模文本嵌入基准测试(Massive Text Embedding Benchmark, MTEB)。MTEB涵盖8类文本嵌入任务,覆盖总计58个数据集和112种语言。通过对33种模型在MTEB上的全面评测,我们建立了迄今为止最全面的文本嵌入基准体系。实验结果表明,没有任何一种文本嵌入方法能在所有任务上全面领先。这表明该领域尚未形成统一的通用文本嵌入方法,也尚未充分扩展和优化,以在所有嵌入任务上均达到最先进水平。MTEB提供开源代码及公开排行榜,访问地址为:https://github.com/embeddings-benchmark/mteb。
代码仓库
climsocana/tecb-de
GitHub 中提及
embeddings-benchmark/mteb
官方
pytorch
GitHub 中提及
basf/chemteb
pytorch
GitHub 中提及
wadoodabdul/clinical_ner_benchmark
GitHub 中提及
lyon-nlp/mteb-french
pytorch
GitHub 中提及
基准测试
| 基准 | 方法 | 指标 | 
|---|---|---|
| information-retrieval-on-mteb | SGPT-5.8B-msmarco | nDCG@10: 50.25  | 
| semantic-textual-similarity-on-mteb | Ada Similarity | Spearman Correlation: 78.6  | 
| semantic-textual-similarity-on-mteb | GTR-Large | Spearman Correlation: 78.19  | 
| semantic-textual-similarity-on-mteb | SGPT-2.7B-msmarco | Spearman Correlation: 76.83  | 
| semantic-textual-similarity-on-mteb | GTR-Base | Spearman Correlation: 77.07  | 
| semantic-textual-similarity-on-mteb | ST5-XXL | Spearman Correlation: 82.63  | 
| semantic-textual-similarity-on-mteb | SimCSE-BERT-unsup | Spearman Correlation: 74.33  | 
| semantic-textual-similarity-on-mteb | Komninos | Spearman Correlation: 62.47  | 
| semantic-textual-similarity-on-mteb | SGPT-5.8B-nli | Spearman Correlation: 80.53  | 
| semantic-textual-similarity-on-mteb | SGPT-5.8B-msmarco | Spearman Correlation: 78.1  | 
| semantic-textual-similarity-on-mteb | SPECTER | Spearman Correlation: 61.02  | 
| semantic-textual-similarity-on-mteb | GTR-XXL | Spearman Correlation: 78.38  | 
| semantic-textual-similarity-on-mteb | MiniLM-L6 | Spearman Correlation: 78.92  | 
| semantic-textual-similarity-on-mteb | SimCSE-BERT-sup | Spearman Correlation: 79.12  | 
| semantic-textual-similarity-on-mteb | LASER2 | Spearman Correlation: 55.32  | 
| semantic-textual-similarity-on-mteb | coCondenser-msmarco | Spearman Correlation: 76.47  | 
| semantic-textual-similarity-on-mteb | GTR-XL | Spearman Correlation: 77.8  | 
| semantic-textual-similarity-on-mteb | ST5-Large | Spearman Correlation: 81.83  | 
| semantic-textual-similarity-on-mteb | LaBSE | Spearman Correlation: 70.8  | 
| semantic-textual-similarity-on-mteb | MPNet | Spearman Correlation: 80.28  | 
| semantic-textual-similarity-on-mteb | BERT | Spearman Correlation: 54.36  | 
| semantic-textual-similarity-on-mteb | SGPT-1.3B-msmarco | Spearman Correlation: 75.74  | 
| semantic-textual-similarity-on-mteb | SGPT-125M-msmarco | Spearman Correlation: 73.41  | 
| semantic-textual-similarity-on-mteb | ST5-XL | Spearman Correlation: 81.66  | 
| semantic-textual-similarity-on-mteb | MiniLM-L12 | Spearman Correlation: 79.8  | 
| semantic-textual-similarity-on-mteb | ST5-Base | Spearman Correlation: 81.14  | 
| semantic-textual-similarity-on-mteb | MPNet-multilingual | Spearman Correlation: 80.73  | 
| semantic-textual-similarity-on-mteb | Glove | Spearman Correlation: 61.85  | 
| semantic-textual-similarity-on-mteb | SGPT-BLOOM-7.1B-msmarco | Spearman Correlation: 77.74  | 
| semantic-textual-similarity-on-mteb | SGPT-125M-nli | Spearman Correlation: 74.71  | 
| text-classification-on-mteb | GTR-Large | Accuracy: 67.14  | 
| text-classification-on-mteb | ST5-XL | Accuracy: 72.84  | 
| text-classification-on-mteb | ST5-XXL | Accuracy: 73.42  | 
| text-classification-on-mteb | LaBSE | Accuracy: 62.71  | 
| text-classification-on-mteb | SGPT-125M-nli | Accuracy: 61.46  | 
| text-classification-on-mteb | SGPT-5.8B-nli | Accuracy: 70.14  | 
| text-classification-on-mteb | Ada Similarity | Accuracy: 70.44  | 
| text-classification-on-mteb | MiniLM-L6 | Accuracy: 63.06  | 
| text-classification-on-mteb | coCondenser-msmarco | Accuracy: 64.71  | 
| text-classification-on-mteb | ST5-Base | Accuracy: 69.81  | 
| text-classification-on-mteb | SGPT-BLOOM-7.1B-msmarco | Accuracy: 66.19  | 
| text-classification-on-mteb | MPNet-multilingual | Accuracy: 67.91  | 
| text-classification-on-mteb | SPECTER | Accuracy: 52.37  | 
| text-classification-on-mteb | GTR-XXL | Accuracy: 67.41  | 
| text-classification-on-mteb | MPNet | Accuracy: 65.07  | 
| text-classification-on-mteb | Komninos | Accuracy: 57.65  | 
| text-classification-on-mteb | SimCSE-BERT-unsup | Accuracy: 62.5  | 
| text-classification-on-mteb | BERT | Accuracy: 61.66  | 
| text-classification-on-mteb | MiniLM-L12-multilingual | Accuracy: 64.3  | 
| text-classification-on-mteb | LASER2 | Accuracy: 53.65  | 
| text-classification-on-mteb | GTR-XL | Accuracy: 67.11  | 
| text-classification-on-mteb | SGPT-125M-msmarco | Accuracy: 60.72  | 
| text-classification-on-mteb | Contriever | Accuracy: 66.68  | 
| text-classification-on-mteb | SimCSE-BERT-sup | Accuracy: 67.32  | 
| text-classification-on-mteb | ST5-Large | Accuracy: 72.31  | 
| text-classification-on-mteb | MiniLM-L12 | Accuracy: 63.21  | 
| text-classification-on-mteb | Glove | Accuracy: 57.29  | 
| text-classification-on-mteb | SGPT-1.3B-msmarco | Accuracy: 66.52  | 
| text-classification-on-mteb | GTR-Base | Accuracy: 65.25  | 
| text-classification-on-mteb | SGPT-2.7B-msmarco | Accuracy: 67.13  | 
| text-classification-on-mteb | SGPT-5.8B-msmarco | Accuracy: 68.13  | 
| text-clustering-on-mteb | SPECTER | V-Measure: 34.06  | 
| text-clustering-on-mteb | coCondenser-msmarco | V-Measure: 37.64  | 
| text-clustering-on-mteb | ST5-XL | V-Measure: 42.34  | 
| text-clustering-on-mteb | SGPT-1.3B-msmarco | V-Measure: 39.92  | 
| text-clustering-on-mteb | GTR-XL | V-Measure: 41.51  | 
| text-clustering-on-mteb | ST5-Base | V-Measure: 40.21  | 
| text-clustering-on-mteb | SGPT-125M-msmarco | V-Measure: 35.79  | 
| text-clustering-on-mteb | MPNet-multilingual | V-Measure: 38.4  | 
| text-clustering-on-mteb | SGPT-125M-nli | V-Measure: 30.95  | 
| text-clustering-on-mteb | Komninos | V-Measure: 26.57  | 
| text-clustering-on-mteb | SGPT-2.7B-msmarco | V-Measure: 39.83  | 
| text-clustering-on-mteb | SGPT-BLOOM-7.1B-msmarco | V-Measure: 38.93  | 
| text-clustering-on-mteb | LASER2 | V-Measure: 15.28  | 
| text-clustering-on-mteb | ST5-Large | V-Measure: 41.65  | 
| text-clustering-on-mteb | SimCSE-BERT-unsup | V-Measure: 29.04  | 
| text-clustering-on-mteb | SGPT-5.8B-nli | V-Measure: 36.98  | 
| text-clustering-on-mteb | Glove | V-Measure: 27.73  | 
| text-clustering-on-mteb | MiniLM-L12 | V-Measure: 41.81  | 
| text-clustering-on-mteb | LaBSE | V-Measure: 29.55  | 
| text-clustering-on-mteb | MiniLM-L6 | V-Measure: 42.35  | 
| text-clustering-on-mteb | BERT | V-Measure: 30.12  | 
| text-clustering-on-mteb | SGPT-5.8B-msmarco | V-Measure: 40.35  | 
| text-clustering-on-mteb | MiniLM-L12-multilingual | V-Measure: 37.14  | 
| text-clustering-on-mteb | MPNet | V-Measure: 43.69  | 
| text-clustering-on-mteb | ST5-XXL | V-Measure: 43.71  | 
| text-clustering-on-mteb | SimCSE-BERT-sup | V-Measure: 33.43  | 
| text-clustering-on-mteb | Ada Similarity | V-Measure: 37.52  | 
| text-clustering-on-mteb | GTR-Base | V-Measure: 38.63  | 
| text-clustering-on-mteb | Contriever | V-Measure: 41.1  | 
| text-clustering-on-mteb | GTR-XXL | V-Measure: 42.42  | 
| text-clustering-on-mteb | GTR-Large | V-Measure: 41.6  | 
| text-retrieval-on-mteb | BERT | nDCG@10: 10.59  | 
| text-retrieval-on-mteb | ST5-XL | nDCG@10: 38.47  | 
| text-retrieval-on-mteb | MPNet-multilingual | nDCG@10: 35.34  | 
| text-retrieval-on-mteb | SPECTER | nDCG@10: 15.88  | 
| text-retrieval-on-mteb | MiniLM-L12 | nDCG@10: 42.69  | 
| text-retrieval-on-mteb | GTR-Large | nDCG@10: 47.42  | 
| text-retrieval-on-mteb | coCondenser-msmarco | nDCG@10: 32.96  | 
| text-retrieval-on-mteb | ST5-Large | nDCG@10: 36.71  | 
| text-retrieval-on-mteb | Glove | nDCG@10: 21.62  | 
| text-retrieval-on-mteb | LaBSE | nDCG@10: 18.99  | 
| text-retrieval-on-mteb | MiniLM-L6 | nDCG@10: 41.95  | 
| text-retrieval-on-mteb | SGPT-5.8B-nli | nDCG@10: 32.34  | 
| text-retrieval-on-mteb | MPNet | nDCG@10: 43.81  | 
| text-retrieval-on-mteb | GTR-Base | nDCG@10: 44.67  | 
| text-retrieval-on-mteb | GTR-XXL | nDCG@10: 48.48  | 
| text-retrieval-on-mteb | LASER2 | nDCG@10: 7.93  | 
| text-retrieval-on-mteb | ST5-XXL | nDCG@10: 42.24  | 
| text-retrieval-on-mteb | SGPT-BLOOM-7.1B-msmarco | nDCG@10: 48.21  | 
| text-retrieval-on-mteb | SGPT-1.3B-msmarco | nDCG@10: 44.49  | 
| text-retrieval-on-mteb | GTR-XL | nDCG@10: 47.96  | 
| text-retrieval-on-mteb | SimCSE-BERT-sup | nDCG@10: 21.82  | 
| text-retrieval-on-mteb | SimCSE-BERT-unsup | nDCG@10: 20.29  | 
| text-retrieval-on-mteb | Komninos | nDCG@10: 21.22  | 
| text-retrieval-on-mteb | SGPT-2.7B-msmarco | nDCG@10: 46.54  | 
| text-retrieval-on-mteb | SGPT-125M-msmarco | nDCG@10: 37.04  | 
| text-retrieval-on-mteb | SGPT-125M-nli | nDCG@10: 20.9  | 
| text-retrieval-on-mteb | SGPT-5.8B-msmarco | nDCG@10: 50.25  | 
| text-retrieval-on-mteb | MiniLM-L12-multilingual | nDCG@10: 32.45  | 
| text-retrieval-on-mteb | ST5-Base | nDCG@10: 33.63  | 
| text-retrieval-on-mteb | Contriever | nDCG@10: 41.88  | 
| text-summarization-on-mteb | LASER2 | Spearman Correlation: 26.8  | 
| text-summarization-on-mteb | Contriever | Spearman Correlation: 30.36  | 
| text-summarization-on-mteb | GTR-XL | Spearman Correlation: 30.21  | 
| text-summarization-on-mteb | ST5-Large | Spearman Correlation: 29.64  | 
| text-summarization-on-mteb | ST5-Base | Spearman Correlation: 31.39  | 
| text-summarization-on-mteb | Glove | Spearman Correlation: 28.87  | 
| text-summarization-on-mteb | MPNet-multilingual | Spearman Correlation: 31.57  | 
| text-summarization-on-mteb | Komninos | Spearman Correlation: 30.49  | 
| text-summarization-on-mteb | SGPT-BLOOM-7.1B-msmarco | Spearman Correlation: 24.99  | 
| text-summarization-on-mteb | GTR-Base | Spearman Correlation: 29.67  | 
| text-summarization-on-mteb | MiniLM-L6 | Spearman Correlation: 30.81  | 
| text-summarization-on-mteb | SimCSE-BERT-unsup | Spearman Correlation: 31.15  | 
| text-summarization-on-mteb | SimCSE-BERT-sup | Spearman Correlation: 23.31  | 
| text-summarization-on-mteb | GTR-XXL | Spearman Correlation: 30.64  | 
| text-summarization-on-mteb | MiniLM-L12 | Spearman Correlation: 27.9  | 
| text-summarization-on-mteb | coCondenser-msmarco | Spearman Correlation: 29.5  | 
| text-summarization-on-mteb | SGPT-5.8B-msmarco | Spearman Correlation: 24.75  | 
| text-summarization-on-mteb | SGPT-125M-nli | Spearman Correlation: 30.26  | 
| text-summarization-on-mteb | MPNet | Spearman Correlation: 27.49  | 
| text-summarization-on-mteb | ST5-XL | Spearman Correlation: 29.91  | 
| text-summarization-on-mteb | Ada Similarity | Spearman Correlation: 26.94  | 
| text-summarization-on-mteb | SGPT-1.3B-msmarco | Spearman Correlation: 25.44  | 
| text-summarization-on-mteb | MiniLM-L12-multilingual | Spearman Correlation: 30.67  | 
| text-summarization-on-mteb | BERT | Spearman Correlation: 29.82  | 
| text-summarization-on-mteb | ST5-XXL | Spearman Correlation: 30.08  | 
| text-summarization-on-mteb | SPECTER | Spearman Correlation: 27.66  |