Natural Language Inference On Rcb
评估指标
Accuracy
Average F1
评测结果
各个模型在此基准测试上的表现结果
模型名称 | Accuracy | Average F1 | Paper Title | Repository |
---|---|---|---|---|
RuGPT3XL few-shot | 0.418 | 0.302 | - | - |
ruRoberta-large finetune | 0.518 | 0.357 | - | - |
Golden Transformer | 0.546 | 0.406 | - | - |
RuBERT plain | 0.463 | 0.367 | - | - |
ruT5-large-finetune | 0.498 | 0.306 | - | - |
Human Benchmark | 0.702 | 0.68 | RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark | |
ruBert-base finetune | 0.509 | 0.333 | - | - |
RuGPT3Large | 0.484 | 0.417 | - | - |
RuGPT3Small | 0.473 | 0.356 | - | - |
YaLM 1.0B few-shot | 0.447 | 0.408 | - | - |
SBERT_Large | 0.452 | 0.371 | - | - |
Multilingual Bert | 0.445 | 0.367 | - | - |
MT5 Large | 0.454 | 0.366 | mT5: A massively multilingual pre-trained text-to-text transformer | |
ruBert-large finetune | 0.5 | 0.356 | - | - |
SBERT_Large_mt_ru_finetuning | 0.486 | 0.351 | - | - |
ruT5-base-finetune | 0.468 | 0.307 | - | - |
heuristic majority | 0.438 | 0.4 | Unreasonable Effectiveness of Rule-Based Heuristics in Solving Russian SuperGLUE Tasks | - |
Random weighted | 0.374 | 0.319 | Unreasonable Effectiveness of Rule-Based Heuristics in Solving Russian SuperGLUE Tasks | - |
RuGPT3Medium | 0.461 | 0.372 | - | - |
RuBERT conversational | 0.484 | 0.452 | - | - |
0 of 22 row(s) selected.