Extractive Summarization as Text Matching

Abstract

Instead of following the commonly used framework of extracting sentences individually and modeling the relationship between sentences
Formulate the extractive summarization task as a semantic text matching problem

→ a source document and candidate summaries will be (extracted from the original text) matched in a semantic space

→ well-grounded in our comprehensive analysis of the inherent gap between sentence-level and summary-level extractors

Introduction

Automatic text summarization: compress a textual document to a shorter highlight while keeping salient information on the original text
Extractive Summarization
- Most of the neural extractive summarization systems
  1. score and extract sentences one by one from the original text
  2. model the relationship between the sentences
  3. select several sentences to form a summary
  - Cheng and Lapata (2016); Nallapati et al. (2017)
    - formulate the extractive summarization task as a sequence labeling problem
    - make independent binary decisions for each sentence, resulting in high redundancy
  - Chen and Bansal, 2018; Jadhav and Rajan, 2018; Zhou et al., 2018
    - introduce an auto-regressive decoder
    - allow the scoring operations of different sentences to influence on each other
  - Trigram Blocking (Paulus et al., 2017; Liu and Lapata, 2019)
    - At the stage of selecting sentences to form a summary, it will skip the sentence that has trigram overlapping with the previously selected sentences.
  ⇒ The above systems of modeling the relationship between sentences are essentially sentence-level extractors.
- We conduct an analysis on six benchmark datasets to better understand the advantages and limitations of sentence-level and summary-level approaches
  
  → There is indeed an inherent gap between the two approaches across these datasets
- MATCHSUM
  - conceptualize extractive summarization as a semantic text matching problem
  - "A good summary should be more semantically similar as a whole to the source document than the unqualified summaries."
  - a Siamese-BERT architecture to compute the similarity between the source document and the candidate summary
    - Siamese BERT leverages the pre-trained BERT in a Siamese network structure to derive semantically meaningful text embeddings that can be compared using cosine-similarity

Related Work: Two-Stage Summarization

the first stage is usually to extract some fragments of the original text
the second stage is to select or modify on the basis of these fragments
Chen and Bansal (2018) and Bae et al. (2019): a hybrid extract-then-rewrite architecture
Lebanoff et al. (2019); Xu and Durrett (2019); Mendes et al. (2019): extract-then- compress learning paradigm
MATCHSUM model can be viewed as an extract-then-match framework

Sentence-Level or Summary-Level?

Questions!

For extractive summarization, is the summary level extractor better than the sentence-level extractor?
Given a dataset, which extractor should we choose based on the characteristics of the data, and what is the inherent gap between these two extractors?

Definition

Document: $D=\{ { s }{ 1 },\quad ...\quad ,{ s }{ n }\}$