Command Palette
Search for a command to run...
HONEST: Measuring Hurtful Sentence Completion in Language Models
{Dirk Hovy Federico Bianchi Debora Nozza}

Abstract
Language models have revolutionized the field of NLP. However, language models capture and proliferate hurtful stereotypes, especially in text generation. Our results show that 4.3{%} of the time, language models complete a sentence with a hurtful word. These cases are not random, but follow language and gender-specific patterns. We propose a score to measure hurtful sentence completions in language models (HONEST). It uses a systematic template- and lexicon-based bias evaluation methodology for six languages. Our findings suggest that these models replicate and amplify deep-seated societal stereotypes about gender roles. Sentence completions refer to sexual promiscuity when the target is female in 9{%} of the time, and in 4{%} to homosexuality when the target is male. The results raise questions about the use of these models in production settings.
Benchmarks
| Benchmark | Methodology | Metrics |
|---|---|---|
| hurtful-sentence-completion-on-honest-en | BERT-large | HONEST: 3.33 |
| hurtful-sentence-completion-on-honest-en | RoBERTa-large | HONEST: 2.62 |
| hurtful-sentence-completion-on-honest-en | RoBERTa-base | HONEST: 2.38 |
| hurtful-sentence-completion-on-honest-en | BERT-base | HONEST: 1.19 |
| hurtful-sentence-completion-on-honest-en | DistilBERT-base | HONEST: 1.90 |
Build AI with AI
From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.