HyperAI
HyperAI超神经
首页
算力平台
文档
资讯
论文
教程
数据集
百科
SOTA
LLM 模型天梯
GPU 天梯
顶会
开源项目
全站搜索
关于
中文
HyperAI
HyperAI超神经
Toggle sidebar
全站搜索…
⌘
K
Command Palette
Search for a command to run...
首页
SOTA
问答
Question Answering On Pubmedqa
Question Answering On Pubmedqa
评估指标
Accuracy
评测结果
各个模型在此基准测试上的表现结果
Columns
模型名称
Accuracy
Paper Title
Repository
Meditron-70B (CoT + SC)
81.6
MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
BioGPT-Large(1.5B)
81.0
BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining
RankRAG-llama3-70B (Zero-Shot)
79.8
RankRAG: Unifying Context Ranking with Retrieval-Augmented Generation in LLMs
-
Med-PaLM 2 (5-shot)
79.2
Towards Expert-Level Medical Question Answering with Large Language Models
Flan-PaLM (540B, Few-shot)
79
Large Language Models Encode Clinical Knowledge
BioGPT(345M)
78.2
BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining
Codex 5-shot CoT
78.2
Can large language models reason about medical questions?
Human Performance (single annotator)
78.0
PubMedQA: A Dataset for Biomedical Research Question Answering
GAL 120B (zero-shot)
77.6
Galactica: A Large Language Model for Science
Flan-PaLM (62B, Few-shot)
77.2
Large Language Models Encode Clinical Knowledge
MediSwift-XL
76.8
MediSwift: Efficient Sparse Pre-trained Biomedical Language Models
-
Flan-T5-XXL
76.80
Evaluation of large language model performance on the Biomedical Language Understanding and Reasoning Benchmark
-
BioMedGPT-10B
76.1
BioMedGPT: Open Multimodal Generative Pre-trained Transformer for BioMedicine
Claude 3 Opus (5-shot)
75.8
The Claude 3 Model Family: Opus, Sonnet, Haiku
-
Flan-PaLM (540B, SC)
75.2
Large Language Models Encode Clinical Knowledge
Med-PaLM 2 (ER)
75.0
Towards Expert-Level Medical Question Answering with Large Language Models
Claude 3 Opus (zero-shot)
74.9
The Claude 3 Model Family: Opus, Sonnet, Haiku
-
Med-PaLM 2 (CoT + SC)
74.0
Towards Expert-Level Medical Question Answering with Large Language Models
BLOOM (zero-shot)
73.6
Galactica: A Large Language Model for Science
CoT-T5-11B (1024 Shot)
73.42
The CoT Collection: Improving Zero-shot and Few-shot Learning of Language Models via Chain-of-Thought Fine-Tuning
0 of 29 row(s) selected.
Previous
Next
Question Answering On Pubmedqa | SOTA | HyperAI超神经