HyperAIHyperAI

Command Palette

Search for a command to run...

5 months ago

Can BERT eat RuCoLA? Topological Data Analysis to Explain

Irina Proskurina; Irina Piontkovskaya; Ekaterina Artemova

Can BERT eat RuCoLA? Topological Data Analysis to Explain

Abstract

This paper investigates how Transformer language models (LMs) fine-tuned for acceptability classification capture linguistic features. Our approach uses the best practices of topological data analysis (TDA) in NLP: we construct directed attention graphs from attention matrices, derive topological features from them, and feed them to linear classifiers. We introduce two novel features, chordality, and the matching number, and show that TDA-based classifiers outperform fine-tuning baselines. We experiment with two datasets, CoLA and RuCoLA in English and Russian, typologically different languages. On top of that, we propose several black-box introspection techniques aimed at detecting changes in the attention mode of the LMs during fine-tuning, defining the LM's prediction confidences, and associating individual heads with fine-grained grammar phenomena. Our results contribute to understanding the behavior of monolingual LMs in the acceptability classification task, provide insights into the functional roles of attention heads, and highlight the advantages of TDA-based approaches for analyzing LMs. We release the code and the experimental results for further uptake.

Benchmarks

BenchmarkMethodologyMetrics
linguistic-acceptability-on-colaBERT+TDA
Accuracy: 88.2%
MCC: 0.726
linguistic-acceptability-on-colaRoBERTa+TDA
Accuracy: 87.3%
MCC: 0.695
linguistic-acceptability-on-rucolaRu-BERT+TDA
Accuracy: 80.1
MCC: 0.478
linguistic-acceptability-on-rucolaRu-RoBERTa+TDA
Accuracy: 85.7
MCC: 0.594

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp