HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Proxy Indicators for the Quality of Open-domain Dialogues

{Ricardo Usbeck Jens Lehmann Rostislav Nedelchev}

Proxy Indicators for the Quality of Open-domain Dialogues

Abstract

The automatic evaluation of open-domain dialogues remains a largely unsolved challenge. Despite the abundance of work done in the field, human judges have to evaluate dialogues’ quality. As a consequence, performing such evaluations at scale is usually expensive. This work investigates using a deep-learning model trained on the General Language Understanding Evaluation (GLUE) benchmark to serve as a quality indication of open-domain dialogues. The aim is to use the various GLUE tasks as different perspectives on judging the quality of conversation, thus reducing the need for additional training data or responses that serve as quality references. Due to this nature, the method can infer various quality metrics and can derive a component-based overall score. We achieve statistically significant correlation coefficients of up to 0.7.

Benchmarks

BenchmarkMethodologyMetrics
dialogue-evaluation-on-usr-personachatLin-Reg (all)
Pearson Correlation: 0.5290
Spearman Correlation: 0.5382
dialogue-evaluation-on-usr-topicalchatLin-Reg (all)
Pearson Correlation: 0.4974
Spearman Correlation: 0.4877

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Proxy Indicators for the Quality of Open-domain Dialogues | Papers | HyperAI