HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

Open Domain Multi-document Summarization: A Comprehensive Study of Model Brittleness under Retrieval

John Giorgi Luca Soldaini Bo Wang Gary Bader Kyle Lo Lucy Lu Wang Arman Cohan

Open Domain Multi-document Summarization: A Comprehensive Study of Model Brittleness under Retrieval

Abstract

Multi-document summarization (MDS) assumes a set of topic-related documents are provided as input. In practice, this document set is not always available; it would need to be retrieved given an information need, i.e. a question or topic statement, a setting we dub "open-domain" MDS. We study this more challenging setting by formalizing the task and bootstrapping it using existing datasets, retrievers and summarizers. Via extensive automatic and human evaluation, we determine: (1) state-of-the-art summarizers suffer large reductions in performance when applied to open-domain MDS, (2) additional training in the open-domain setting can reduce this sensitivity to imperfect retrieval, and (3) summarizers are insensitive to the retrieval of duplicate documents and the order of retrieved documents, but highly sensitive to other errors, like the retrieval of irrelevant documents. Based on our results, we provide practical guidelines to enable future work on open-domain MDS, e.g. how to choose the number of retrieved documents to summarize. Our results suggest that new retrieval and summarization methods and annotated resources for training and evaluation are necessary for further progress in the open-domain setting.

Benchmarks

BenchmarkMethodologyMetrics
multi-document-summarization-on-ms-2led-base-16384-ms2
BertScoreF1: 0.8693

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Open Domain Multi-document Summarization: A Comprehensive Study of Model Brittleness under Retrieval | Papers | HyperAI