Question Answering On Pubmedqa
评估指标
Accuracy
评测结果
各个模型在此基准测试上的表现结果
比较表格
模型名称 | Accuracy |
---|---|
mediswift-efficient-sparse-pre-trained | 76.8 |
large-language-models-encode-clinical | 34 |
biogpt-generative-pre-trained-transformer-for | 78.2 |
large-language-models-encode-clinical | 57.8 |
domain-specific-language-model-pretraining | 55.84 |
the-claude-3-model-family-opus-sonnet-haiku | 75.8 |
evaluation-of-large-language-model | 76.80 |
galactica-a-large-language-model-for-science-1 | 77.6 |
pubmedqa-a-dataset-for-biomedical-research | 78.0 |
bioelectra-pretrained-biomedical-text-encoder | 64.2 |
linkbert-pretraining-language-models-with | 70.2 |
galactica-a-large-language-model-for-science-1 | 73.6 |
large-language-models-encode-clinical | 79 |
towards-expert-level-medical-question | 74.0 |
towards-expert-level-medical-question | 75.0 |
large-language-models-encode-clinical | 77.2 |
biomedgpt-open-multimodal-generative-pre | 76.1 |
large-language-models-encode-clinical | 55 |
towards-expert-level-medical-question | 79.2 |
linkbert-pretraining-language-models-with | 72.2 |
galactica-a-large-language-model-for-science-1 | 70.2 |
meditron-70b-scaling-medical-pretraining-for | 81.6 |
biogpt-generative-pre-trained-transformer-for | 81.0 |
the-claude-3-model-family-opus-sonnet-haiku | 74.9 |
can-large-language-models-reason-about | 78.2 |
the-cot-collection-improving-zero-shot-and | 73.42 |
large-language-models-encode-clinical | 75.2 |
large-language-models-encode-clinical | 67.6 |
rankrag-unifying-context-ranking-with | 79.8 |