Cargando…

Correcting spelling mistakes in Persian texts with rules and deep learning methods

This study aims to develop a system for automatically correcting spelling errors in Persian texts using two approaches: one that relies on rules and a common spelling mistake list and another that uses a deep neural network. The list of 700 common misspellings was compiled, and a database of 55,000...

Descripción completa

Detalles Bibliográficos
Autores principales: Kasmaiee, Sa., Kasmaiee, Si., Homayounpour, M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10652024/
https://www.ncbi.nlm.nih.gov/pubmed/37968293
http://dx.doi.org/10.1038/s41598-023-47295-2
_version_ 1785147669589524480
author Kasmaiee, Sa.
Kasmaiee, Si.
Homayounpour, M.
author_facet Kasmaiee, Sa.
Kasmaiee, Si.
Homayounpour, M.
author_sort Kasmaiee, Sa.
collection PubMed
description This study aims to develop a system for automatically correcting spelling errors in Persian texts using two approaches: one that relies on rules and a common spelling mistake list and another that uses a deep neural network. The list of 700 common misspellings was compiled, and a database of 55,000 common Persian words was used to identify spelling errors in the rule-based approach. 112 rules were implemented for spelling correction, each providing suggested words for misspelled words. 2500 sentences were used for evaluation, with the word with the shortest Levenshtein distance selected for evaluation. In the deep learning approach, a deep encoder-decoder network that utilized long short-term memory (LSTM) with a word embedding layer was used as the base network, with FastText chosen as the word embedding layer. The base network was enhanced by adding convolutional and capsule layers. A database of 1.2 million sentences was created, with 800,000 for training, 200,000 for testing, and 200,000 for evaluation. The results showed that the network's performance with capsule and convolutional layers was similar to that of the base network. The network performed well in evaluation, achieving accuracy, precision, recall, F-measure, and bilingual evaluation understudy (Bleu) scores of 87%, 70%, 89%, 78%, and 84%, respectively.
format Online
Article
Text
id pubmed-10652024
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-106520242023-11-15 Correcting spelling mistakes in Persian texts with rules and deep learning methods Kasmaiee, Sa. Kasmaiee, Si. Homayounpour, M. Sci Rep Article This study aims to develop a system for automatically correcting spelling errors in Persian texts using two approaches: one that relies on rules and a common spelling mistake list and another that uses a deep neural network. The list of 700 common misspellings was compiled, and a database of 55,000 common Persian words was used to identify spelling errors in the rule-based approach. 112 rules were implemented for spelling correction, each providing suggested words for misspelled words. 2500 sentences were used for evaluation, with the word with the shortest Levenshtein distance selected for evaluation. In the deep learning approach, a deep encoder-decoder network that utilized long short-term memory (LSTM) with a word embedding layer was used as the base network, with FastText chosen as the word embedding layer. The base network was enhanced by adding convolutional and capsule layers. A database of 1.2 million sentences was created, with 800,000 for training, 200,000 for testing, and 200,000 for evaluation. The results showed that the network's performance with capsule and convolutional layers was similar to that of the base network. The network performed well in evaluation, achieving accuracy, precision, recall, F-measure, and bilingual evaluation understudy (Bleu) scores of 87%, 70%, 89%, 78%, and 84%, respectively. Nature Publishing Group UK 2023-11-15 /pmc/articles/PMC10652024/ /pubmed/37968293 http://dx.doi.org/10.1038/s41598-023-47295-2 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Kasmaiee, Sa.
Kasmaiee, Si.
Homayounpour, M.
Correcting spelling mistakes in Persian texts with rules and deep learning methods
title Correcting spelling mistakes in Persian texts with rules and deep learning methods
title_full Correcting spelling mistakes in Persian texts with rules and deep learning methods
title_fullStr Correcting spelling mistakes in Persian texts with rules and deep learning methods
title_full_unstemmed Correcting spelling mistakes in Persian texts with rules and deep learning methods
title_short Correcting spelling mistakes in Persian texts with rules and deep learning methods
title_sort correcting spelling mistakes in persian texts with rules and deep learning methods
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10652024/
https://www.ncbi.nlm.nih.gov/pubmed/37968293
http://dx.doi.org/10.1038/s41598-023-47295-2
work_keys_str_mv AT kasmaieesa correctingspellingmistakesinpersiantextswithrulesanddeeplearningmethods
AT kasmaieesi correctingspellingmistakesinpersiantextswithrulesanddeeplearningmethods
AT homayounpourm correctingspellingmistakesinpersiantextswithrulesanddeeplearningmethods