Command Palette
Search for a command to run...
Susan Zhang; Stephen Roller; Naman Goyal; Mikel Artetxe; Moya Chen; Shuohui Chen; Christopher Dewan; Mona Diab; Xian Li; Xi Victoria Lin; Todor Mihaylov; Myle Ott; Sam Shleifer; Kurt Shuster; Daniel Simig; Punit Singh Koura; Anjali Sridhar; Tianlu Wang; Luke Zettlemoyer

摘要
大型语言模型,通常需要数万甚至数十万个计算日进行训练,已经展现出在零样本和少样本学习方面的卓越能力。鉴于其高昂的计算成本,这些模型在没有大量资金的情况下难以复制。对于少数通过API提供的模型,研究者无法访问完整的模型权重,这使得对它们的研究变得困难。我们推出了开放预训练变换器(Open Pre-trained Transformers, OPT),这是一系列从1.25亿到1750亿参数的仅解码器预训练变换器,我们计划全面且负责任地与感兴趣的科研人员共享这些模型。我们展示了OPT-175B与GPT-3相当,但开发过程中所需的碳足迹仅为GPT-3的七分之一。此外,我们还将发布我们的实验记录,详细描述我们在基础设施方面遇到的挑战,并提供代码以便研究者对所有发布的模型进行实验。
代码仓库
facebookresearch/metaseq
官方
pytorch
znhy1024/protoco
pytorch
GitHub 中提及
ecolab-postech/owq
pytorch
GitHub 中提及
mthcom/hscore-dataset-pruning
pytorch
GitHub 中提及
pku-alignment/safe-rlhf
pytorch
GitHub 中提及
xvyaward/owq
pytorch
GitHub 中提及
liangyuwang/zo2
pytorch
GitHub 中提及
MindCode-4/code-2/tree/main/opt
mindspore
基准测试
| 基准 | 方法 | 指标 | 
|---|---|---|
| hate-speech-detection-on-ethos-binary | OPT-175B (one-shot) | F1-score: 0.713  | 
| hate-speech-detection-on-ethos-binary | OPT-175B (zero-shot) | F1-score: 0.667  | 
| hate-speech-detection-on-ethos-binary | Davinci (few-shot) | F1-score: 0.354  | 
| hate-speech-detection-on-ethos-binary | OPT-175B (few-shot) | F1-score: 0.759  | 
| hate-speech-detection-on-ethos-binary | Davinci (one-shot) | F1-score: 0.616  | 
| hate-speech-detection-on-ethos-binary | Davinci (zero-shot) | F1-score: 0.628  | 
| stereotypical-bias-analysis-on-crows-pairs | GPT-3 | Age: 64.4 Disability: 76.7 Gender: 62.6 Nationality: 61.6 Overall: 67.2 Physical Appearance: 74.6 Race/Color: 64.7 Religion: 62.6 Sexual Orientation: 76.2 Socioeconomic status: 73.8  | 
| stereotypical-bias-analysis-on-crows-pairs | OPT-175B | Age: 67.8 Disability: 76.7 Gender: 65.7 Nationality: 62.9 Overall: 69.5 Physical Appearance: 76.2 Race/Color: 68.6 Religion: 65.7 Sexual Orientation: 78.6 Socioeconomic status: 76.2  |