HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Can large language models reason about medical questions?

Valentin Liévin Christoffer Egeberg Hother Andreas Geert Motzfeldt Ole Winther

Can large language models reason about medical questions?

Abstract

Although large language models (LLMs) often produce impressive outputs, it remains unclear how they perform in real-world scenarios requiring strong reasoning skills and expert domain knowledge. We set out to investigate whether close- and open-source models (GPT-3.5, LLama-2, etc.) can be applied to answer and reason about difficult real-world-based questions. We focus on three popular medical benchmarks (MedQA-USMLE, MedMCQA, and PubMedQA) and multiple prompting scenarios: Chain-of-Thought (CoT, think step-by-step), few-shot and retrieval augmentation. Based on an expert annotation of the generated CoTs, we found that InstructGPT can often read, reason and recall expert knowledge. Last, by leveraging advances in prompt engineering (few-shot and ensemble methods), we demonstrated that GPT-3.5 not only yields calibrated predictive distributions, but also reaches the passing score on three datasets: MedQA-USMLE 60.2%, MedMCQA 62.7% and PubMedQA 78.2%. Open-source models are closing the gap: Llama-2 70B also passed the MedQA-USMLE with 62.5% accuracy.

Code Repositories

vlievin/medical-reasoning
Official
Mentioned in GitHub

Benchmarks

BenchmarkMethodologyMetrics
multiple-choice-question-answering-mcqa-on-21Codex 5-shot CoT
Dev Set (Acc-%): 0.597
Test Set (Acc-%): 0.627
question-answering-on-medqa-usmleCodex 5-shot CoT
Accuracy: 60.2
question-answering-on-pubmedqaCodex 5-shot CoT
Accuracy: 78.2

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp