Instead of following the commonly used framework of extracting sentences individually and modeling the relationship between sentences
Formulate the extractive summarization task as a semantic text matching problem
→ a source document and candidate summaries will be (extracted from the original text) matched in a semantic space
→ well-grounded in our comprehensive analysis of the inherent gap between sentence-level and summary-level extractors
Most of the neural extractive summarization systems
score and extract sentences one by one from the original text
model the relationship between the sentences
select several sentences to form a summary
⇒ The above systems of modeling the relationship between sentences are essentially sentence-level extractors.
We conduct an analysis on six benchmark datasets to better understand the advantages and limitations of sentence-level and summary-level approaches
→ There is indeed an inherent gap between the two approaches across these datasets
MATCHSUM
For extractive summarization, is the summary level extractor better than the sentence-level extractor?
Given a dataset, which extractor should we choose based on the characteristics of the data, and what is the inherent gap between these two extractors?
Definition
Document: $D=\{ { s }{ 1 },\quad ...\quad ,{ s }{ n }\}$