3 months ago

Training End-to-End Dialogue Systems with the Ubuntu Dialogue Corpus

{Joelle Pineau Chia-Wei Liu Laurent Charlin Iulian Vlad Serban Nissan Pow Ryan Lowe}

Abstract

In this paper, we analyze neural network-based dialogue systems trained in an end-to-end manner using an updated version of the recent Ubuntu Dialogue Corpus, a dataset containing almost 1 million multi-turn dialogues, with a total of over 7 million utterances and 100 million words. This dataset is interesting because of its size, long context lengths, and technical nature; thus, it can be used to train large models directly from data with minimal feature engineering. We provide baselines in two different environments: one where models are trained to select the correct next response from a list of candidate responses, and one where models are trained to maximize the loglikelihood of a generated utterance conditioned on the context of the conversation. These are both evaluated on a recall task that we call next utterance classification (NUC), and using vector-based metrics that capture the topicality of the responses. We observe that current end-to-end models are unable to completely solve these tasks; thus, we provide a qualitative error analysis to determine the primary causes of error for end-to-end models evaluated on NUC, and examine sample utterances from the generative models. As a result of this analysis, we suggest some promising directions for future research on the Ubuntu Dialogue Corpus, which can also be applied to end-to-end dialogue systems in general.

Benchmarks

Benchmark	Methodology	Metrics
conversation-disentanglement-on-linux-irc-ch2	Heuristic	1-1: 43.4 Local: 67.9 Shen F-1: 50.7
conversation-disentanglement-on-linux-irc-ch2-1	Heuristic	1-1: 45.1 Local: 73.8 Shen F-1: 51.8

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding

Ready-to-use GPUs

Best Pricing

Get Started

Hyper Newsletters

Subscribe to our latest updates

We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning

Command Palette