Cargando…

Abstractive text summarization of low-resourced languages using deep learning

BACKGROUND: Humans must be able to cope with the huge amounts of information produced by the information technology revolution. As a result, automatic text summarization is being employed in a range of industries to assist individuals in identifying the most important information. For text summariza...

Descripción completa

Detalles Bibliográficos
Autores principales: Shafiq, Nida, Hamid, Isma, Asif, Muhammad, Nawaz, Qamar, Aljuaid, Hanan, Ali, Hamid
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10280265/
https://www.ncbi.nlm.nih.gov/pubmed/37346684
http://dx.doi.org/10.7717/peerj-cs.1176
_version_ 1785060760807800832
author Shafiq, Nida
Hamid, Isma
Asif, Muhammad
Nawaz, Qamar
Aljuaid, Hanan
Ali, Hamid
author_facet Shafiq, Nida
Hamid, Isma
Asif, Muhammad
Nawaz, Qamar
Aljuaid, Hanan
Ali, Hamid
author_sort Shafiq, Nida
collection PubMed
description BACKGROUND: Humans must be able to cope with the huge amounts of information produced by the information technology revolution. As a result, automatic text summarization is being employed in a range of industries to assist individuals in identifying the most important information. For text summarization, two approaches are mainly considered: text summarization by the extractive and abstractive methods. The extractive summarisation approach selects chunks of sentences like source documents, while the abstractive approach can generate a summary based on mined keywords. For low-resourced languages, e.g., Urdu, extractive summarization uses various models and algorithms. However, the study of abstractive summarization in Urdu is still a challenging task. Because there are so many literary works in Urdu, producing abstractive summaries demands extensive research. METHODOLOGY: This article proposed a deep learning model for the Urdu language by using the Urdu 1 Million news dataset and compared its performance with the two widely used methods based on machine learning, such as support vector machine (SVM) and logistic regression (LR). The results show that the suggested deep learning model performs better than the other two approaches. The summaries produced by extractive summaries are processed using the encoder-decoder paradigm to create an abstractive summary. RESULTS: With the help of Urdu language specialists, the system-generated summaries were validated, showing the proposed model’s improvement and accuracy.
format Online
Article
Text
id pubmed-10280265
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-102802652023-06-21 Abstractive text summarization of low-resourced languages using deep learning Shafiq, Nida Hamid, Isma Asif, Muhammad Nawaz, Qamar Aljuaid, Hanan Ali, Hamid PeerJ Comput Sci Data Mining and Machine Learning BACKGROUND: Humans must be able to cope with the huge amounts of information produced by the information technology revolution. As a result, automatic text summarization is being employed in a range of industries to assist individuals in identifying the most important information. For text summarization, two approaches are mainly considered: text summarization by the extractive and abstractive methods. The extractive summarisation approach selects chunks of sentences like source documents, while the abstractive approach can generate a summary based on mined keywords. For low-resourced languages, e.g., Urdu, extractive summarization uses various models and algorithms. However, the study of abstractive summarization in Urdu is still a challenging task. Because there are so many literary works in Urdu, producing abstractive summaries demands extensive research. METHODOLOGY: This article proposed a deep learning model for the Urdu language by using the Urdu 1 Million news dataset and compared its performance with the two widely used methods based on machine learning, such as support vector machine (SVM) and logistic regression (LR). The results show that the suggested deep learning model performs better than the other two approaches. The summaries produced by extractive summaries are processed using the encoder-decoder paradigm to create an abstractive summary. RESULTS: With the help of Urdu language specialists, the system-generated summaries were validated, showing the proposed model’s improvement and accuracy. PeerJ Inc. 2023-01-13 /pmc/articles/PMC10280265/ /pubmed/37346684 http://dx.doi.org/10.7717/peerj-cs.1176 Text en ©2023 Shafiq et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle Data Mining and Machine Learning
Shafiq, Nida
Hamid, Isma
Asif, Muhammad
Nawaz, Qamar
Aljuaid, Hanan
Ali, Hamid
Abstractive text summarization of low-resourced languages using deep learning
title Abstractive text summarization of low-resourced languages using deep learning
title_full Abstractive text summarization of low-resourced languages using deep learning
title_fullStr Abstractive text summarization of low-resourced languages using deep learning
title_full_unstemmed Abstractive text summarization of low-resourced languages using deep learning
title_short Abstractive text summarization of low-resourced languages using deep learning
title_sort abstractive text summarization of low-resourced languages using deep learning
topic Data Mining and Machine Learning
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10280265/
https://www.ncbi.nlm.nih.gov/pubmed/37346684
http://dx.doi.org/10.7717/peerj-cs.1176
work_keys_str_mv AT shafiqnida abstractivetextsummarizationoflowresourcedlanguagesusingdeeplearning
AT hamidisma abstractivetextsummarizationoflowresourcedlanguagesusingdeeplearning
AT asifmuhammad abstractivetextsummarizationoflowresourcedlanguagesusingdeeplearning
AT nawazqamar abstractivetextsummarizationoflowresourcedlanguagesusingdeeplearning
AT aljuaidhanan abstractivetextsummarizationoflowresourcedlanguagesusingdeeplearning
AT alihamid abstractivetextsummarizationoflowresourcedlanguagesusingdeeplearning