HyperAI

Question Answering On Medqa Usmle

Metrics

Accuracy

Results

Performance results of various models on this benchmark

Model Name
Accuracy
Paper TitleRepository
Flan-PaLM (540 B)67.6Large Language Models Encode Clinical Knowledge-
Meditron-70B (CoT + SC)70.2MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
GAL 120B (zero-shot)44.4Galactica: A Large Language Model for Science
Med-Gemini91.1Capabilities of Gemini Models in Medicine-
VOD (BioLinkBERT)55.0Variational Open-Domain Question Answering
Shakti-LLM (2.5B)60.3SHAKTI: A 2.5 Billion Parameter Small Language Model Optimized for Edge AI and Low-Resource Environments-
Med-PaLM 2 (5-shot)79.7Towards Expert-Level Medical Question Answering with Large Language Models
GrapeQA: PEGA39.51GrapeQA: GRaph Augmentation and Pruning to Enhance Question-Answering-
BioLinkBERT (base)40.0LinkBERT: Pretraining Language Models with Document Links
DRAGON + BioLinkBERT47.5Deep Bidirectional Language-Knowledge Graph Pretraining
LLAMA-2 (70B)59.2MEDITRON-70B: Scaling Medical Pretraining for Large Language Models
Med-PaLM 2 (CoT + SC)83.7Towards Expert-Level Medical Question Answering with Large Language Models
GPT-490.2Can Generalist Foundation Models Outcompete Special-Purpose Tuning? Case Study in Medicine
MedMobile (3.8B)75.7MedMobile: A mobile-sized language model with expert-level clinical capabilities-
Med-PaLM 285.4Towards Expert-Level Medical Question Answering with Large Language Models
BioLinkBERT (340 M)45.1Large Language Models Encode Clinical Knowledge-
OPT (few-shot, k=5)22.8Galactica: A Large Language Model for Science
Meerkat-7B (Single)70.6Small Language Models Learn Enhanced Reasoning Skills from Medical Textbooks-
PubMedGPT (2.7 B)50.3Large Language Models Encode Clinical Knowledge-
BioBERT (large)36.7BioBERT: a pre-trained biomedical language representation model for biomedical text mining
0 of 27 row(s) selected.