Cargando…
CovSumm: an unsupervised transformer-cum-graph-based hybrid document summarization model for CORD-19
The number of research articles published on COVID-19 has dramatically increased since the outbreak of the pandemic in November 2019. This absurd rate of productivity in research articles leads to information overload. It has increasingly become urgent for researchers and medical associations to sta...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer US
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10131559/ https://www.ncbi.nlm.nih.gov/pubmed/37359325 http://dx.doi.org/10.1007/s11227-023-05291-3 |
_version_ | 1785031203355623424 |
---|---|
author | Karotia, Akanksha Susan, Seba |
author_facet | Karotia, Akanksha Susan, Seba |
author_sort | Karotia, Akanksha |
collection | PubMed |
description | The number of research articles published on COVID-19 has dramatically increased since the outbreak of the pandemic in November 2019. This absurd rate of productivity in research articles leads to information overload. It has increasingly become urgent for researchers and medical associations to stay up to date on the latest COVID-19 studies. To address information overload in COVID-19 scientific literature, the study presents a novel hybrid model named CovSumm, an unsupervised graph-based hybrid approach for single-document summarization, that is evaluated on the CORD-19 dataset. We have tested the proposed methodology on the scientific papers in the database dated from January 1, 2021 to December 31, 2021, consisting of 840 documents in total. The proposed text summarization is a hybrid of two distinctive extractive approaches (1) GenCompareSum (transformer-based approach) and (2) TextRank (graph-based approach). The sum of scores generated by both methods is used to rank the sentences for generating the summary. On the CORD-19, the recall-oriented understudy for gisting evaluation (ROUGE) score metric is used to compare the performance of the CovSumm model with various state-of-the-art techniques. The proposed method achieved the highest scores of ROUGE-1: 40.14%, ROUGE-2: 13.25%, and ROUGE-L: 36.32%. The proposed hybrid approach shows improved performance on the CORD-19 dataset when compared to existing unsupervised text summarization methods. |
format | Online Article Text |
id | pubmed-10131559 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Springer US |
record_format | MEDLINE/PubMed |
spelling | pubmed-101315592023-04-27 CovSumm: an unsupervised transformer-cum-graph-based hybrid document summarization model for CORD-19 Karotia, Akanksha Susan, Seba J Supercomput Article The number of research articles published on COVID-19 has dramatically increased since the outbreak of the pandemic in November 2019. This absurd rate of productivity in research articles leads to information overload. It has increasingly become urgent for researchers and medical associations to stay up to date on the latest COVID-19 studies. To address information overload in COVID-19 scientific literature, the study presents a novel hybrid model named CovSumm, an unsupervised graph-based hybrid approach for single-document summarization, that is evaluated on the CORD-19 dataset. We have tested the proposed methodology on the scientific papers in the database dated from January 1, 2021 to December 31, 2021, consisting of 840 documents in total. The proposed text summarization is a hybrid of two distinctive extractive approaches (1) GenCompareSum (transformer-based approach) and (2) TextRank (graph-based approach). The sum of scores generated by both methods is used to rank the sentences for generating the summary. On the CORD-19, the recall-oriented understudy for gisting evaluation (ROUGE) score metric is used to compare the performance of the CovSumm model with various state-of-the-art techniques. The proposed method achieved the highest scores of ROUGE-1: 40.14%, ROUGE-2: 13.25%, and ROUGE-L: 36.32%. The proposed hybrid approach shows improved performance on the CORD-19 dataset when compared to existing unsupervised text summarization methods. Springer US 2023-04-26 /pmc/articles/PMC10131559/ /pubmed/37359325 http://dx.doi.org/10.1007/s11227-023-05291-3 Text en © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023, Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic. |
spellingShingle | Article Karotia, Akanksha Susan, Seba CovSumm: an unsupervised transformer-cum-graph-based hybrid document summarization model for CORD-19 |
title | CovSumm: an unsupervised transformer-cum-graph-based hybrid document summarization model for CORD-19 |
title_full | CovSumm: an unsupervised transformer-cum-graph-based hybrid document summarization model for CORD-19 |
title_fullStr | CovSumm: an unsupervised transformer-cum-graph-based hybrid document summarization model for CORD-19 |
title_full_unstemmed | CovSumm: an unsupervised transformer-cum-graph-based hybrid document summarization model for CORD-19 |
title_short | CovSumm: an unsupervised transformer-cum-graph-based hybrid document summarization model for CORD-19 |
title_sort | covsumm: an unsupervised transformer-cum-graph-based hybrid document summarization model for cord-19 |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10131559/ https://www.ncbi.nlm.nih.gov/pubmed/37359325 http://dx.doi.org/10.1007/s11227-023-05291-3 |
work_keys_str_mv | AT karotiaakanksha covsummanunsupervisedtransformercumgraphbasedhybriddocumentsummarizationmodelforcord19 AT susanseba covsummanunsupervisedtransformercumgraphbasedhybriddocumentsummarizationmodelforcord19 |