Cargando…
Abstractive text summarization of low-resourced languages using deep learning
BACKGROUND: Humans must be able to cope with the huge amounts of information produced by the information technology revolution. As a result, automatic text summarization is being employed in a range of industries to assist individuals in identifying the most important information. For text summariza...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
PeerJ Inc.
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10280265/ https://www.ncbi.nlm.nih.gov/pubmed/37346684 http://dx.doi.org/10.7717/peerj-cs.1176 |
_version_ | 1785060760807800832 |
---|---|
author | Shafiq, Nida Hamid, Isma Asif, Muhammad Nawaz, Qamar Aljuaid, Hanan Ali, Hamid |
author_facet | Shafiq, Nida Hamid, Isma Asif, Muhammad Nawaz, Qamar Aljuaid, Hanan Ali, Hamid |
author_sort | Shafiq, Nida |
collection | PubMed |
description | BACKGROUND: Humans must be able to cope with the huge amounts of information produced by the information technology revolution. As a result, automatic text summarization is being employed in a range of industries to assist individuals in identifying the most important information. For text summarization, two approaches are mainly considered: text summarization by the extractive and abstractive methods. The extractive summarisation approach selects chunks of sentences like source documents, while the abstractive approach can generate a summary based on mined keywords. For low-resourced languages, e.g., Urdu, extractive summarization uses various models and algorithms. However, the study of abstractive summarization in Urdu is still a challenging task. Because there are so many literary works in Urdu, producing abstractive summaries demands extensive research. METHODOLOGY: This article proposed a deep learning model for the Urdu language by using the Urdu 1 Million news dataset and compared its performance with the two widely used methods based on machine learning, such as support vector machine (SVM) and logistic regression (LR). The results show that the suggested deep learning model performs better than the other two approaches. The summaries produced by extractive summaries are processed using the encoder-decoder paradigm to create an abstractive summary. RESULTS: With the help of Urdu language specialists, the system-generated summaries were validated, showing the proposed model’s improvement and accuracy. |
format | Online Article Text |
id | pubmed-10280265 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | PeerJ Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-102802652023-06-21 Abstractive text summarization of low-resourced languages using deep learning Shafiq, Nida Hamid, Isma Asif, Muhammad Nawaz, Qamar Aljuaid, Hanan Ali, Hamid PeerJ Comput Sci Data Mining and Machine Learning BACKGROUND: Humans must be able to cope with the huge amounts of information produced by the information technology revolution. As a result, automatic text summarization is being employed in a range of industries to assist individuals in identifying the most important information. For text summarization, two approaches are mainly considered: text summarization by the extractive and abstractive methods. The extractive summarisation approach selects chunks of sentences like source documents, while the abstractive approach can generate a summary based on mined keywords. For low-resourced languages, e.g., Urdu, extractive summarization uses various models and algorithms. However, the study of abstractive summarization in Urdu is still a challenging task. Because there are so many literary works in Urdu, producing abstractive summaries demands extensive research. METHODOLOGY: This article proposed a deep learning model for the Urdu language by using the Urdu 1 Million news dataset and compared its performance with the two widely used methods based on machine learning, such as support vector machine (SVM) and logistic regression (LR). The results show that the suggested deep learning model performs better than the other two approaches. The summaries produced by extractive summaries are processed using the encoder-decoder paradigm to create an abstractive summary. RESULTS: With the help of Urdu language specialists, the system-generated summaries were validated, showing the proposed model’s improvement and accuracy. PeerJ Inc. 2023-01-13 /pmc/articles/PMC10280265/ /pubmed/37346684 http://dx.doi.org/10.7717/peerj-cs.1176 Text en ©2023 Shafiq et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited. |
spellingShingle | Data Mining and Machine Learning Shafiq, Nida Hamid, Isma Asif, Muhammad Nawaz, Qamar Aljuaid, Hanan Ali, Hamid Abstractive text summarization of low-resourced languages using deep learning |
title | Abstractive text summarization of low-resourced languages using deep learning |
title_full | Abstractive text summarization of low-resourced languages using deep learning |
title_fullStr | Abstractive text summarization of low-resourced languages using deep learning |
title_full_unstemmed | Abstractive text summarization of low-resourced languages using deep learning |
title_short | Abstractive text summarization of low-resourced languages using deep learning |
title_sort | abstractive text summarization of low-resourced languages using deep learning |
topic | Data Mining and Machine Learning |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10280265/ https://www.ncbi.nlm.nih.gov/pubmed/37346684 http://dx.doi.org/10.7717/peerj-cs.1176 |
work_keys_str_mv | AT shafiqnida abstractivetextsummarizationoflowresourcedlanguagesusingdeeplearning AT hamidisma abstractivetextsummarizationoflowresourcedlanguagesusingdeeplearning AT asifmuhammad abstractivetextsummarizationoflowresourcedlanguagesusingdeeplearning AT nawazqamar abstractivetextsummarizationoflowresourcedlanguagesusingdeeplearning AT aljuaidhanan abstractivetextsummarizationoflowresourcedlanguagesusingdeeplearning AT alihamid abstractivetextsummarizationoflowresourcedlanguagesusingdeeplearning |