Command Palette
Search for a command to run...
Ting Jiang Shaohan Huang Zhongzhi Luan Deqing Wang Fuzhen Zhuang

摘要
近年来,大语言模型(Large Language Models, LLMs)引起了广泛关注。借助上下文学习(in-context learning)能力,LLMs 在各类自然语言处理任务中取得了令人瞩目的成果。然而,将 LLMs 应用于句子嵌入(sentence embeddings)领域仍是当前研究的活跃方向。本文提出一种基于上下文学习的方法,旨在提升句子嵌入的性能。我们的方法在自回归模型的提示词(prompt-based)表示框架基础上进行改进,构建了用于上下文学习的示范样本集(demonstration set),并系统地对不同规模的 LLMs 进行扩展实验。通过大量实验验证,我们发现无需任何微调(fine-tuning),仅通过上下文学习即可使 LLMs 生成高质量的句子嵌入,其性能可与当前主流的对比学习(contrastive learning)方法相媲美。在模型规模扩展方面,我们观察到当模型参数量超过数十亿时,其在语义文本相似性(Semantic Textual Similarity, STS)任务上的表现反而下降。然而,最大规模的模型在迁移任务上仍显著优于其他模型,并在该类任务上取得了新的最先进(state-of-the-art)结果。此外,我们还采用当前主流的对比学习方法对 LLMs 进行微调。实验结果表明,结合我们提出的提示词方法后,参数量为 27 亿的 OPT 模型在 STS 任务上的表现超越了参数量达 48 亿的 ST5 模型,再次刷新了该任务的最先进水平。本文相关代码已开源,地址为:https://github.com/kongds/scaling_sentemb。
代码仓库
kongds/scaling_sentemb
官方
pytorch
基准测试
| 基准 | 方法 | 指标 | 
|---|---|---|
| semantic-textual-similarity-on-sick | PromptEOL+CSE+OPT-13B | Spearman Correlation: 0.8206  | 
| semantic-textual-similarity-on-sick | PromptEOL+CSE+OPT-2.7B | Spearman Correlation: 0.8129  | 
| semantic-textual-similarity-on-sick | PromptEOL+CSE+LLaMA-30B | Spearman Correlation: 0.8238  | 
| semantic-textual-similarity-on-sts-benchmark | PromptEOL+CSE+LLaMA-30B | Spearman Correlation: 0.8914  | 
| semantic-textual-similarity-on-sts-benchmark | PromptEOL+CSE+OPT-13B | Spearman Correlation: 0.8856  | 
| semantic-textual-similarity-on-sts-benchmark | PromptEOL+CSE+OPT-2.7B | Spearman Correlation: 0.8833  | 
| semantic-textual-similarity-on-sts12 | PromptEOL+CSE+LLaMA-30B | Spearman Correlation: 0.7972  | 
| semantic-textual-similarity-on-sts12 | PromptEOL+CSE+OPT-13B | Spearman Correlation: 0.8020  | 
| semantic-textual-similarity-on-sts12 | PromptEOL+CSE+OPT-2.7B | Spearman Correlation: 0.7949  | 
| semantic-textual-similarity-on-sts13 | PromptEOL+CSE+LLaMA-30B | Spearman Correlation: 0.9025  | 
| semantic-textual-similarity-on-sts13 | PromptEOL+CSE+OPT-13B | Spearman Correlation: 0.9024  | 
| semantic-textual-similarity-on-sts13 | PromptEOL+CSE+OPT-2.7B | Spearman Correlation: 0.8964  | 
| semantic-textual-similarity-on-sts14 | PromptEOL+CSE+OPT-13B | Spearman Correlation: 0.8534  | 
| semantic-textual-similarity-on-sts14 | PromptEOL+CSE+OPT-2.7B | Spearman Correlation: 0.8480  | 
| semantic-textual-similarity-on-sts14 | PromptEOL+CSE+LLaMA-30B | Spearman Correlation: 0.8585  | 
| semantic-textual-similarity-on-sts15 | PromptEOL+CSE+OPT-2.7B | Spearman Correlation: 0.8951  | 
| semantic-textual-similarity-on-sts15 | PromptEOL+CSE+OPT-13B | Spearman Correlation: 0.8952  | 
| semantic-textual-similarity-on-sts15 | PromptEOL+CSE+LLaMA-30B | Spearman Correlation: 0.9004  | 
| semantic-textual-similarity-on-sts16 | PromptEOL+CSE+LLaMA-30B | Spearman Correlation: 0.8627  | 
| semantic-textual-similarity-on-sts16 | PromptEOL+CSE+OPT-13B | Spearman Correlation: 0.8590  | 
| semantic-textual-similarity-on-sts16 | PromptEOL+CSE+OPT-2.7B | Spearman Correlation: 0.8591  |