HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

TAPE: Assessing Few-shot Russian Language Understanding

TAPE: Assessing Few-shot Russian Language Understanding

Abstract

Recent advances in zero-shot and few-shot learning have shown promise for a scope of research and practical purposes. However, this fast-growing area lacks standardized evaluation suites for non-English languages, hindering progress outside the Anglo-centric paradigm. To address this line of research, we propose TAPE (Text Attack and Perturbation Evaluation), a novel benchmark that includes six more complex NLU tasks for Russian, covering multi-hop reasoning, ethical concepts, logic and commonsense knowledge. The TAPE's design focuses on systematic zero-shot and few-shot NLU evaluation: (i) linguistic-oriented adversarial attacks and perturbations for analyzing robustness, and (ii) subpopulations for nuanced interpretation. The detailed analysis of testing the autoregressive baselines indicates that simple spelling-based perturbations affect the performance the most, while paraphrasing the input has a more negligible effect. At the same time, the results demonstrate a significant gap between the neural and human baselines for most tasks. We publicly release TAPE (tape-benchmark.com) to foster research on robust LMs that can generalize to new tasks when little to no supervision is available.

Code Repositories

Benchmarks

BenchmarkMethodologyMetrics
ethics-on-ethicsRuGPT-3 Large
Accuracy: 68.6
ethics-on-ethicsHuman benchmark
Accuracy: 52.9
ethics-on-ethicsRuGPT-3 Meduim
Accuracy: 68.3
ethics-on-ethicsRuGPT-3 Small
Accuracy: 55.5
ethics-on-ethics-2RuGPT-3 Small
Accuracy: 60.9
ethics-on-ethics-2Human benchmark
Accuracy: 67.6
ethics-on-ethics-2RuGPT-3 Large
Accuracy: 44.9
ethics-on-ethics-2RuGPT-3 Medium
Accuracy: 44.1
logical-reasoning-on-ruworldtreeRuGPT-3 Medium
Accuracy : 38.0
logical-reasoning-on-ruworldtreeHuman benchmark
Accuracy : 83.7
logical-reasoning-on-ruworldtreeRuGPT-3 Small
Accuracy : 34.0
logical-reasoning-on-ruworldtreeRuGPT-3 Large
Accuracy : 40.7
logical-reasoning-on-winograd-automaticRuGPT-3 Small
Accuracy: 57.9
logical-reasoning-on-winograd-automaticRuGPT-3 Medium
Accuracy: 57.2
logical-reasoning-on-winograd-automaticHuman benchmark
Accuracy: 87.0
logical-reasoning-on-winograd-automaticRuGPT-3 Large
Accuracy: 55.5
question-answering-on-chegekaRuGPT-3 Large
Accuracy: 00
question-answering-on-chegekaHuman benchmark
Accuracy: 64.5
question-answering-on-chegekaRuGPT-3 Medium
Accuracy: 00
question-answering-on-chegekaRuGPT-3 Small
Accuracy: 00
question-answering-on-multiqRuGPT-3 Small
Accuracy: 00
question-answering-on-multiqRuGPT-3 Medium
Accuracy: 00
question-answering-on-multiqRuGPT-3 Large
Accuracy: 00
question-answering-on-multiqHuman benchmark
Accuracy: 91.0
question-answering-on-ruopenbookqaRuGPT-3 Large
Accuracy: 55.5
question-answering-on-ruopenbookqaRuGPT-3 Small
Accuracy: 57.9
question-answering-on-ruopenbookqaHuman benchmark
Accuracy: 86.5
question-answering-on-ruopenbookqaRuGPT-3 Medium
Accuracy: 57.2

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp