Command Palette
Search for a command to run...

摘要
在本研究中,我们开发并发布了Llama 2,这是一系列预训练和微调的大规模语言模型(LLMs),参数规模从70亿到700亿不等。我们的微调模型称为Llama 2-Chat,专门针对对话应用场景进行了优化。在我们测试的大多数基准上,这些模型的表现优于开源聊天模型,并且根据我们在有用性和安全性方面的人类评估结果,它们可能成为闭源模型的合适替代品。我们详细描述了对Llama 2-Chat进行微调和安全改进的方法,以帮助社区在此基础上进一步发展,并促进大规模语言模型(LLMs)负责任的研发。
代码仓库
xverse-ai/xverse-13b
pytorch
GitHub 中提及
coastalcph/eu-politics-llms
pytorch
GitHub 中提及
facebookresearch/llama
官方
pytorch
IBM/Dromedary
pytorch
GitHub 中提及
squeezeailab/squeezellm
pytorch
GitHub 中提及
zurichnlp/contradecode
pytorch
GitHub 中提及
eternityyw/tram-benchmark
GitHub 中提及
xuetianci/pacit
pytorch
GitHub 中提及
young-geng/easylm
jax
GitHub 中提及
meetyou-ai-lab/can-mc-evaluate-llms
pytorch
GitHub 中提及
llamafamily/llama-chinese
pytorch
GitHub 中提及
glb400/Toy-RecLM
pytorch
GitHub 中提及
rijgersberg/geitje
pytorch
GitHub 中提及
flagalpha/llama2-chinese
pytorch
GitHub 中提及
usyd-fsalab/fp6_llm
pytorch
GitHub 中提及
idiap/abroad-re
pytorch
GitHub 中提及
ninglab/ecellm
pytorch
GitHub 中提及
Lightning-AI/lit-gpt
pytorch
GitHub 中提及
xzhang97666/alpacare
GitHub 中提及
基准测试
| 基准 | 方法 | 指标 | 
|---|---|---|
| arithmetic-reasoning-on-gsm8k | LLaMA 2 70B (on-shot) | Accuracy: 56.8 Parameters (Billion): 70  | 
| code-generation-on-mbpp | Llama 2 34B (0-shot) | Accuracy: 33  | 
| code-generation-on-mbpp | Llama 2 7B (0-shot) | Accuracy: 20.8  | 
| code-generation-on-mbpp | Llama 2 70B (zero-shot) | Accuracy: 45  | 
| code-generation-on-mbpp | Llama 2 13B (0-shot) | Accuracy: 30.6  | 
| math-word-problem-solving-on-mawps | LLaMA 2-Chat | Accuracy (%): 82.4  | 
| math-word-problem-solving-on-svamp | LLaMA 2-Chat | Execution Accuracy: 69.2  | 
| multi-task-language-understanding-on-mmlu | LLaMA 2 13B (5-shot) | Average (%): 54.8  | 
| multi-task-language-understanding-on-mmlu | LLaMA 2 34B (5-shot) | Average (%): 62.6  | 
| multi-task-language-understanding-on-mmlu | LLaMA 2 7B (5-shot) | Average (%): 45.3  | 
| multiple-choice-question-answering-mcqa-on-25 | Llama2-7B | Accuracy: 43.38  | 
| multiple-choice-question-answering-mcqa-on-25 | Llama2-7B-chat | Accuracy: 40.07  | 
| question-answering-on-boolq | LLaMA 2 13B (0-shot) | Accuracy: 81.7  | 
| question-answering-on-boolq | LLaMA 2 34B (0-shot) | Accuracy: 83.7  | 
| question-answering-on-boolq | LLaMA 2 7B (zero-shot) | Accuracy: 77.4  | 
| question-answering-on-boolq | LLaMA 2 70B (0-shot) | Accuracy: 85  | 
| question-answering-on-multitq | LLaMA2 | Hits@1: 18.5  | 
| question-answering-on-natural-questions | LLaMA 2 70B (one-shot) | EM: 33.0  | 
| question-answering-on-piqa | LLaMA 2 13B (0-shot) | Accuracy: 80.5  | 
| question-answering-on-piqa | LLaMA 2 34B (0-shot) | Accuracy: 81.9  | 
| question-answering-on-piqa | LLaMA 2 7B (0-shot) | Accuracy: 78.8  | 
| question-answering-on-piqa | LLaMA 2 70B (0-shot) | Accuracy: 82.8  | 
| question-answering-on-pubchemqa | Llama2-7B-chat | BLEU-2: 0.075 BLEU-4: 0.009 MEATOR: 0.149 ROUGE-1: 0.184 ROUGE-2: 0.043 ROUGE-L: 0.142  | 
| question-answering-on-triviaqa | LLaMA 2 70B (one-shot) | EM: 85  | 
| question-answering-on-uniprotqa | Llama2-7B-chat | BLEU-2: 0.019 BLEU-4: 0.002 MEATOR: 0.052 ROUGE-1: 0.103 ROUGE-2: 0.060 ROUGE-L: 0.009  |