Question Answering On Newsqa

评测结果

各个模型在此基准测试上的表现结果

			Paper Title	Repository
OpenAI/o3-mini-2025-01-31-high	96.52	92.13	o3-mini vs DeepSeek-R1: Which One is Safer?
OpenAI/o1-2024-12-17-high	81.44	88.7	0/1 Deep Neural Networks via Block Coordinate Descent	-
xAI/grok-2-1212	70.57	88.24	XAI for Transformers: Better Explanations through Conservative Propagation
deepseek-r1	80.57	86.13	DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Riple/Saanvi-v0.1	72.61	85.44	Time-series Transformer Generative Adversarial Networks
Anthropic/claude-3-5-sonnet	74.23	82.3	Claude 3.5 Sonnet Model Card Addendum	-
OpenAI/GPT-4o	70.21	81.74	GPT-4o as the Gold Standard: A Scalable and General Purpose Approach to Filter Language Model Pretraining Data	-
Google/Gemini 1.5 Flash	68.75	79.91	Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
SpanBERT	-	73.6	SpanBERT: Improving Pre-training by Representing and Predicting Spans
LinkBERT (large)	-	72.6	LinkBERT: Pretraining Language Models with Document Links
DyREX	-	68.53	DyREx: Dynamic Query Representation for Extractive Question Answering
DecaProp	53.1	66.3	Densely Connected Attention Propagation for Reading Comprehension
BERT+ASGen	54.7	64.5	-	-
AMANDA	48.4	63.7	A Question-Focused Multi-Factor Attention Network for Question Answering
MINIMAL(Dyn)	50.1	63.2	Efficient and Robust Question Answering from Minimal Context over Documents
FastQAExt	43.7	56.1	Making Neural QA as Simple as Possible but not Simpler

0 of 16 row(s) selected.