Reading Comprehension On Muserc

Average F1

评测结果

各个模型在此基准测试上的表现结果

模型名称	Average F1	EM	Paper Title	Repository
Baseline TF-IDF1.1	0.587	0.242	RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark
Human Benchmark	0.806	0.42	RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark
RuBERT conversational	0.687	0.278	-	-
ruBert-large finetune	0.76	0.427	-	-
Random weighted	0.45	0.071	Unreasonable Effectiveness of Rule-Based Heuristics in Solving Russian SuperGLUE Tasks	-
ruT5-base-finetune	0.769	0.446	-	-
SBERT_Large	0.646	0.327	-	-
MT5 Large	0.844	0.543	mT5: A massively multilingual pre-trained text-to-text transformer
ruRoberta-large finetune	0.83	0.561	-	-
ruBert-base finetune	0.742	0.399	-	-
RuGPT3Large	0.729	0.333	-	-
YaLM 1.0B few-shot	0.673	0.364	-	-
RuGPT3Medium	0.706	0.308	-	-
SBERT_Large_mt_ru_finetuning	0.642	0.319	-	-
Golden Transformer	0.941	0.819	-	-
Multilingual Bert	0.639	0.239	-	-
ruT5-large-finetune	0.815	0.537	-	-
RuGPT3Small	0.653	0.221	-	-
heuristic majority	0.671	0.237	Unreasonable Effectiveness of Rule-Based Heuristics in Solving Russian SuperGLUE Tasks	-
RuGPT3XL few-shot	0.74	0.546	-	-

0 of 22 row(s) selected.