Command Palette
Search for a command to run...
Beno James P.

摘要
双向变压器在情感分析方面表现出色,而大型语言模型(LLM)则是有效的零样本学习器。它们是否能作为一个团队表现得更好?本文探讨了ELECTRA与GPT-4o之间在三分类情感分析中的协作方法。我们使用来自斯坦福情感树库(SST)和DynaSent的混合评论数据对四个模型(ELECTRA基础版/大版,GPT-4o/4o-mini)进行了微调(FT)。我们将ELECTRA的预测标签、概率和检索到的示例作为输入提供给GPT。实验结果显示,共享经过微调的ELECTRA基础版预测结果给GPT-4o-mini显著提升了性能(宏F1值为82.50,高于单独使用的ELECTRA基础版微调模型的79.14和GPT-4o-mini的79.41),并且实现了最低的成本/性能比(每提升一个F1点仅需0.12美元)。然而,当对GPT模型进行微调时,包含预测结果反而降低了性能。经过微调的GPT-4o FT-M表现最佳(宏F1值为86.99),紧随其后的是成本低得多的GPT-4o-mini FT(宏F1值为86.70,每提升一个F1点仅需0.38美元,而GPT-4o FT-M则需要1.59美元)。我们的研究结果表明,通过在提示中加入经过微调的编码器的预测结果可以有效提升性能,并且经过微调的GPT-4o-mini在成本降低76%的情况下几乎与GPT-4o FT的表现相当。这两种方案都是资源有限项目中的经济选择。
代码仓库
jbeno/sentiment
官方
pytorch
GitHub 中提及
基准测试
| 基准 | 方法 | 指标 | 
|---|---|---|
| sentiment-analysis-on-dynasent | GPT-4o + ELECTRA Large FT (Prompt, Label, Examples) | Macro F1: 81.53  | 
| sentiment-analysis-on-dynasent | ELECTRA Large Fine-Tuned | Macro F1: 76.29  | 
| sentiment-analysis-on-dynasent | GPT-4o-mini + ELECTRA Large FT (Prompt, Label) | Macro F1: 77.94  | 
| sentiment-analysis-on-dynasent | GPT-4o + ELECTRA Large FT | Macro F1: 77.69  | 
| sentiment-analysis-on-dynasent | ELECTRA Base Fine-Tuned | Macro F1: 71.83  | 
| sentiment-analysis-on-dynasent | GPT-4o Fine-Tuned (Minimal) | Macro F1: 89  | 
| sentiment-analysis-on-dynasent | GPT-4o-mini + ELECTRA Large FT (Prompt, Label, Probabilities) | Macro F1: 79.72  | 
| sentiment-analysis-on-dynasent | GPT-4o-mini + ELECTRA Base FT | Macro F1: 76.19  | 
| sentiment-analysis-on-dynasent | GPT-4o-mini Fine-Tuned | Macro F1: 86.9  | 
| sentiment-analysis-on-dynasent | GPT-4o (Prompt) | Macro F1: 80.22  | 
| sentiment-analysis-on-dynasent | GPT-4o-mini (Prompt) | Macro F1: 77.35  | 
| sentiment-analysis-on-sentiment-merged | GPT-4o-mini + ELECTRA Base FT (Prompt, Label) | Macro F1: 82.74  | 
| sentiment-analysis-on-sentiment-merged | GPT-4o-mini + ELECTRA Large FT (Prompt, Label) | Macro F1: 83.49  | 
| sentiment-analysis-on-sentiment-merged | GPT-4o Fine-Tuned (Minimal) | Macro F1: 86.99  | 
| sentiment-analysis-on-sentiment-merged | GPT-4o (Prompt) | Macro F1: 80.14  | 
| sentiment-analysis-on-sentiment-merged | GPT-4o + ELECTRA Large FT (Prompt, Label) | Macro F1: 81.57  | 
| sentiment-analysis-on-sentiment-merged | GPT-4o + ELECTRA Large FT (Prompt, Label, Examples) | Macro F1: 83.09  | 
| sentiment-analysis-on-sentiment-merged | ELECTRA Large Fine-Tuned | Macro F1: 82.36  | 
| sentiment-analysis-on-sentiment-merged | ELECTRA Base Fine-Tuned | Macro F1: 79.29  | 
| sentiment-analysis-on-sentiment-merged | GPT-4o-mini Fine-Tuned | Macro F1: 86.77  | 
| sentiment-analysis-on-sentiment-merged | GPT-4o-mini (Prompt) | Macro F1: 79.52  | 
| sentiment-analysis-on-sst-3 | GPT-4o-mini + ELECTRA Base FT | Macro F1: 71.72  | 
| sentiment-analysis-on-sst-3 | ELECTRA Base Fine-Tuned | Macro F1: 69.95  | 
| sentiment-analysis-on-sst-3 | GPT-4o-mini + ELECTRA Large FT (Prompt, Label) | Macro F1: 70.99  | 
| sentiment-analysis-on-sst-3 | GPT-4o + ELECTRA Large FT | Macro F1: 72.94  | 
| sentiment-analysis-on-sst-3 | GPT-4o-mini (Prompt) | Macro F1: 70.67  | 
| sentiment-analysis-on-sst-3 | GPT-4o Fine-Tuned (Minimal) | Macro F1: 73.99  | 
| sentiment-analysis-on-sst-3 | GPT-4o (Prompt) | Macro F1: 72.2  | 
| sentiment-analysis-on-sst-3 | ELECTRA Large Fine-Tuned | Macro F1: 70.90  | 
| sentiment-analysis-on-sst-3 | GPT-4o-mini + ELECTRA Large FT (Prompt, Label, Examples) | Macro F1: 71.98  | 
| sentiment-analysis-on-sst-3 | GPT-4o-mini Fine-Tuned | Macro F1: 75.68  | 
| sentiment-analysis-on-sst-3 | GPT-4o + ELECTRA Large FT (Prompt, Label, Examples) | Macro F1: 72.06  |