HyperAI超神经

Question Answering On Pubmedqa

评估指标

Accuracy

评测结果

各个模型在此基准测试上的表现结果

模型名称
Accuracy
Paper TitleRepository
MediSwift-XL76.8MediSwift: Efficient Sparse Pre-trained Biomedical Language Models-
PaLM (8B, Few-shot)34Large Language Models Encode Clinical Knowledge-
BioGPT(345M)78.2BioGPT: Generative Pre-trained Transformer for Biomedical Text Generation and Mining
PaLM (62B, Few-shot)57.8Large Language Models Encode Clinical Knowledge-
PubMedBERT uncased55.84Domain-Specific Language Model Pretraining for Biomedical Natural Language Processing
Claude 3 Opus (5-shot)75.8The Claude 3 Model Family: Opus, Sonnet, Haiku-
Flan-T5-XXL76.80Evaluation of large language model performance on the Biomedical Language Understanding and Reasoning Benchmark-
GAL 120B (zero-shot)77.6Galactica: A Large Language Model for Science
Human Performance (single annotator)78.0PubMedQA: A Dataset for Biomedical Research Question Answering
BioELECTRA uncased64.2BioELECTRA:Pretrained Biomedical text Encoder using Discriminators
BioLinkBERT (base)70.2LinkBERT: Pretraining Language Models with Document Links
BLOOM (zero-shot)73.6Galactica: A Large Language Model for Science
Flan-PaLM (540B, Few-shot)79Large Language Models Encode Clinical Knowledge-
Med-PaLM 2 (CoT + SC)74.0Towards Expert-Level Medical Question Answering with Large Language Models
Med-PaLM 2 (ER)75.0Towards Expert-Level Medical Question Answering with Large Language Models
Flan-PaLM (62B, Few-shot)77.2Large Language Models Encode Clinical Knowledge-
BioMedGPT-10B76.1BioMedGPT: Open Multimodal Generative Pre-trained Transformer for BioMedicine
PaLM (540B, Few-shot)55Large Language Models Encode Clinical Knowledge-
Med-PaLM 2 (5-shot)79.2Towards Expert-Level Medical Question Answering with Large Language Models
BioLinkBERT (large)72.2LinkBERT: Pretraining Language Models with Document Links
0 of 29 row(s) selected.