HyperAIHyperAI

Command Palette

Search for a command to run...

3 months ago

GenCompareSum: a hybrid unsupervised summarization method using salience

{Sophia Ananiadou Qianqian Xie Jennifer Bishop}

GenCompareSum: a hybrid unsupervised summarization method using salience

Abstract

Text summarization (TS) is an important NLP task. Pre-trained Language Models (PLMs) have been used to improve the performance of TS. However, PLMs are limited by their need of labelled training data and by their attention mechanism, which often makes them unsuitable for use on long documents. To this end, we propose a hybrid, unsupervised, abstractive-extractive approach, in which we walk through a document, generating salient textual fragments representing its key points. We then select the most important sentences of the document by choosing the most similar sentences to the generated texts, calculated using BERTScore. We evaluate the efficacy of generating and using salient textual fragments to guide extractive summarization on documents from the biomedical and general scientific domains. We compare the performance between long and short documents using different generative text models, which are finetuned to generate relevant queries or document titles. We show that our hybrid approach out-performs existing unsupervised methods, as well as state-of-the-art supervised methods, despite not needing a vast amount of labelled training data.

Benchmarks

BenchmarkMethodologyMetrics
text-summarization-on-arxivGenCompareSum
ROUGE-1: 39.96
ROUGE-2: 15.15
ROUGE-L: 36.19
text-summarization-on-cord-19GenCompareSum
ROUGE-1: 41.02
ROUGE-2: 13.79
ROUGE-L: 37.25
text-summarization-on-pubmed-1GenCompareSum
ROUGE-1: 42.10
ROUGE-2: 16.51
ROUGE-L: 38.25
text-summarization-on-s2orcGenCompareSum
ROUGE-1: 43.39
ROUGE-2: 16.84
ROUGE-L: 39.82

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp