Cargando…

CovSumm: an unsupervised transformer-cum-graph-based hybrid document summarization model for CORD-19

The number of research articles published on COVID-19 has dramatically increased since the outbreak of the pandemic in November 2019. This absurd rate of productivity in research articles leads to information overload. It has increasingly become urgent for researchers and medical associations to sta...

Descripción completa

Detalles Bibliográficos
Autores principales: Karotia, Akanksha, Susan, Seba
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer US 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10131559/
https://www.ncbi.nlm.nih.gov/pubmed/37359325
http://dx.doi.org/10.1007/s11227-023-05291-3
_version_ 1785031203355623424
author Karotia, Akanksha
Susan, Seba
author_facet Karotia, Akanksha
Susan, Seba
author_sort Karotia, Akanksha
collection PubMed
description The number of research articles published on COVID-19 has dramatically increased since the outbreak of the pandemic in November 2019. This absurd rate of productivity in research articles leads to information overload. It has increasingly become urgent for researchers and medical associations to stay up to date on the latest COVID-19 studies. To address information overload in COVID-19 scientific literature, the study presents a novel hybrid model named CovSumm, an unsupervised graph-based hybrid approach for single-document summarization, that is evaluated on the CORD-19 dataset. We have tested the proposed methodology on the scientific papers in the database dated from January 1, 2021 to December 31, 2021, consisting of 840 documents in total. The proposed text summarization is a hybrid of two distinctive extractive approaches (1) GenCompareSum (transformer-based approach) and (2) TextRank (graph-based approach). The sum of scores generated by both methods is used to rank the sentences for generating the summary. On the CORD-19, the recall-oriented understudy for gisting evaluation (ROUGE) score metric is used to compare the performance of the CovSumm model with various state-of-the-art techniques. The proposed method achieved the highest scores of ROUGE-1: 40.14%, ROUGE-2: 13.25%, and ROUGE-L: 36.32%. The proposed hybrid approach shows improved performance on the CORD-19 dataset when compared to existing unsupervised text summarization methods.
format Online
Article
Text
id pubmed-10131559
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Springer US
record_format MEDLINE/PubMed
spelling pubmed-101315592023-04-27 CovSumm: an unsupervised transformer-cum-graph-based hybrid document summarization model for CORD-19 Karotia, Akanksha Susan, Seba J Supercomput Article The number of research articles published on COVID-19 has dramatically increased since the outbreak of the pandemic in November 2019. This absurd rate of productivity in research articles leads to information overload. It has increasingly become urgent for researchers and medical associations to stay up to date on the latest COVID-19 studies. To address information overload in COVID-19 scientific literature, the study presents a novel hybrid model named CovSumm, an unsupervised graph-based hybrid approach for single-document summarization, that is evaluated on the CORD-19 dataset. We have tested the proposed methodology on the scientific papers in the database dated from January 1, 2021 to December 31, 2021, consisting of 840 documents in total. The proposed text summarization is a hybrid of two distinctive extractive approaches (1) GenCompareSum (transformer-based approach) and (2) TextRank (graph-based approach). The sum of scores generated by both methods is used to rank the sentences for generating the summary. On the CORD-19, the recall-oriented understudy for gisting evaluation (ROUGE) score metric is used to compare the performance of the CovSumm model with various state-of-the-art techniques. The proposed method achieved the highest scores of ROUGE-1: 40.14%, ROUGE-2: 13.25%, and ROUGE-L: 36.32%. The proposed hybrid approach shows improved performance on the CORD-19 dataset when compared to existing unsupervised text summarization methods. Springer US 2023-04-26 /pmc/articles/PMC10131559/ /pubmed/37359325 http://dx.doi.org/10.1007/s11227-023-05291-3 Text en © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2023, Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Article
Karotia, Akanksha
Susan, Seba
CovSumm: an unsupervised transformer-cum-graph-based hybrid document summarization model for CORD-19
title CovSumm: an unsupervised transformer-cum-graph-based hybrid document summarization model for CORD-19
title_full CovSumm: an unsupervised transformer-cum-graph-based hybrid document summarization model for CORD-19
title_fullStr CovSumm: an unsupervised transformer-cum-graph-based hybrid document summarization model for CORD-19
title_full_unstemmed CovSumm: an unsupervised transformer-cum-graph-based hybrid document summarization model for CORD-19
title_short CovSumm: an unsupervised transformer-cum-graph-based hybrid document summarization model for CORD-19
title_sort covsumm: an unsupervised transformer-cum-graph-based hybrid document summarization model for cord-19
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10131559/
https://www.ncbi.nlm.nih.gov/pubmed/37359325
http://dx.doi.org/10.1007/s11227-023-05291-3
work_keys_str_mv AT karotiaakanksha covsummanunsupervisedtransformercumgraphbasedhybriddocumentsummarizationmodelforcord19
AT susanseba covsummanunsupervisedtransformercumgraphbasedhybriddocumentsummarizationmodelforcord19