Question Answering On Danetqa
评估指标
Accuracy
评测结果
各个模型在此基准测试上的表现结果
模型名称 | Accuracy | Paper Title | Repository |
---|---|---|---|
Human Benchmark | 0.915 | RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark | |
Multilingual Bert | 0.624 | - | - |
ruRoberta-large finetune | 0.82 | - | - |
MT5 Large | 0.657 | mT5: A massively multilingual pre-trained text-to-text transformer | |
majority_class | 0.503 | Unreasonable Effectiveness of Rule-Based Heuristics in Solving Russian SuperGLUE Tasks | - |
RuGPT3Medium | 0.634 | - | - |
RuBERT plain | 0.639 | - | - |
RuGPT3Small | 0.61 | - | - |
ruBert-large finetune | 0.773 | - | - |
Baseline TF-IDF1.1 | 0.621 | RussianSuperGLUE: A Russian Language Understanding Evaluation Benchmark | |
YaLM 1.0B few-shot | 0.637 | - | - |
ruT5-large-finetune | 0.711 | - | - |
Random weighted | 0.52 | Unreasonable Effectiveness of Rule-Based Heuristics in Solving Russian SuperGLUE Tasks | - |
RuGPT3XL few-shot | 0.59 | - | - |
RuBERT conversational | 0.606 | - | - |
SBERT_Large_mt_ru_finetuning | 0.697 | - | - |
Golden Transformer | 0.917 | - | - |
RuGPT3Large | 0.604 | - | - |
SBERT_Large | 0.675 | - | - |
ruT5-base-finetune | 0.732 | - | - |
0 of 22 row(s) selected.