Cargando…
Correcting spelling mistakes in Persian texts with rules and deep learning methods
This study aims to develop a system for automatically correcting spelling errors in Persian texts using two approaches: one that relies on rules and a common spelling mistake list and another that uses a deep neural network. The list of 700 common misspellings was compiled, and a database of 55,000...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10652024/ https://www.ncbi.nlm.nih.gov/pubmed/37968293 http://dx.doi.org/10.1038/s41598-023-47295-2 |
_version_ | 1785147669589524480 |
---|---|
author | Kasmaiee, Sa. Kasmaiee, Si. Homayounpour, M. |
author_facet | Kasmaiee, Sa. Kasmaiee, Si. Homayounpour, M. |
author_sort | Kasmaiee, Sa. |
collection | PubMed |
description | This study aims to develop a system for automatically correcting spelling errors in Persian texts using two approaches: one that relies on rules and a common spelling mistake list and another that uses a deep neural network. The list of 700 common misspellings was compiled, and a database of 55,000 common Persian words was used to identify spelling errors in the rule-based approach. 112 rules were implemented for spelling correction, each providing suggested words for misspelled words. 2500 sentences were used for evaluation, with the word with the shortest Levenshtein distance selected for evaluation. In the deep learning approach, a deep encoder-decoder network that utilized long short-term memory (LSTM) with a word embedding layer was used as the base network, with FastText chosen as the word embedding layer. The base network was enhanced by adding convolutional and capsule layers. A database of 1.2 million sentences was created, with 800,000 for training, 200,000 for testing, and 200,000 for evaluation. The results showed that the network's performance with capsule and convolutional layers was similar to that of the base network. The network performed well in evaluation, achieving accuracy, precision, recall, F-measure, and bilingual evaluation understudy (Bleu) scores of 87%, 70%, 89%, 78%, and 84%, respectively. |
format | Online Article Text |
id | pubmed-10652024 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-106520242023-11-15 Correcting spelling mistakes in Persian texts with rules and deep learning methods Kasmaiee, Sa. Kasmaiee, Si. Homayounpour, M. Sci Rep Article This study aims to develop a system for automatically correcting spelling errors in Persian texts using two approaches: one that relies on rules and a common spelling mistake list and another that uses a deep neural network. The list of 700 common misspellings was compiled, and a database of 55,000 common Persian words was used to identify spelling errors in the rule-based approach. 112 rules were implemented for spelling correction, each providing suggested words for misspelled words. 2500 sentences were used for evaluation, with the word with the shortest Levenshtein distance selected for evaluation. In the deep learning approach, a deep encoder-decoder network that utilized long short-term memory (LSTM) with a word embedding layer was used as the base network, with FastText chosen as the word embedding layer. The base network was enhanced by adding convolutional and capsule layers. A database of 1.2 million sentences was created, with 800,000 for training, 200,000 for testing, and 200,000 for evaluation. The results showed that the network's performance with capsule and convolutional layers was similar to that of the base network. The network performed well in evaluation, achieving accuracy, precision, recall, F-measure, and bilingual evaluation understudy (Bleu) scores of 87%, 70%, 89%, 78%, and 84%, respectively. Nature Publishing Group UK 2023-11-15 /pmc/articles/PMC10652024/ /pubmed/37968293 http://dx.doi.org/10.1038/s41598-023-47295-2 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Kasmaiee, Sa. Kasmaiee, Si. Homayounpour, M. Correcting spelling mistakes in Persian texts with rules and deep learning methods |
title | Correcting spelling mistakes in Persian texts with rules and deep learning methods |
title_full | Correcting spelling mistakes in Persian texts with rules and deep learning methods |
title_fullStr | Correcting spelling mistakes in Persian texts with rules and deep learning methods |
title_full_unstemmed | Correcting spelling mistakes in Persian texts with rules and deep learning methods |
title_short | Correcting spelling mistakes in Persian texts with rules and deep learning methods |
title_sort | correcting spelling mistakes in persian texts with rules and deep learning methods |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10652024/ https://www.ncbi.nlm.nih.gov/pubmed/37968293 http://dx.doi.org/10.1038/s41598-023-47295-2 |
work_keys_str_mv | AT kasmaieesa correctingspellingmistakesinpersiantextswithrulesanddeeplearningmethods AT kasmaieesi correctingspellingmistakesinpersiantextswithrulesanddeeplearningmethods AT homayounpourm correctingspellingmistakesinpersiantextswithrulesanddeeplearningmethods |