Cargando…
A method for solving scriptio continua in Javanese manuscript transliteration
Many Javanese manuscripts in Indonesia are stored in museums and libraries. Most of these manuscripts were written using local scripts that are rarely used in everyday life, and hence a software application that can help and improve the reading of these manuscripts is valuable. An essential step in...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7191592/ https://www.ncbi.nlm.nih.gov/pubmed/32373737 http://dx.doi.org/10.1016/j.heliyon.2020.e03827 |
_version_ | 1783527881255682048 |
---|---|
author | Widiarti, Anastasia Rita Pulungan, Reza |
author_facet | Widiarti, Anastasia Rita Pulungan, Reza |
author_sort | Widiarti, Anastasia Rita |
collection | PubMed |
description | Many Javanese manuscripts in Indonesia are stored in museums and libraries. Most of these manuscripts were written using local scripts that are rarely used in everyday life, and hence a software application that can help and improve the reading of these manuscripts is valuable. An essential step in automatic manuscript image transliteration is post-processing, which involves editing and concatenating syllables into words. The main problem of post-processing is that there exists no symbol for space between words in a sentence, which is called the scriptio-continua problem. This paper proposes methods based on the backtracking algorithm to solve the scriptio continua in the post-processing step of Javanese manuscript image transliteration. The proposed methods use a depth-first search in seeking relevant candidate words to determine whether to merge a new syllable or not. The results of the proposed methods to concatenate 17,687 syllables from the Hamong Tani book using a dictionary containing 49,801 words are found to be satisfactory in terms of computation and accuracy. The accuracy of the implemented greedy and brute-force methods is both 81.64%. However, the greedy-based method is more efficient and has a better performance than the brute-force method. |
format | Online Article Text |
id | pubmed-7191592 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Elsevier |
record_format | MEDLINE/PubMed |
spelling | pubmed-71915922020-05-05 A method for solving scriptio continua in Javanese manuscript transliteration Widiarti, Anastasia Rita Pulungan, Reza Heliyon Article Many Javanese manuscripts in Indonesia are stored in museums and libraries. Most of these manuscripts were written using local scripts that are rarely used in everyday life, and hence a software application that can help and improve the reading of these manuscripts is valuable. An essential step in automatic manuscript image transliteration is post-processing, which involves editing and concatenating syllables into words. The main problem of post-processing is that there exists no symbol for space between words in a sentence, which is called the scriptio-continua problem. This paper proposes methods based on the backtracking algorithm to solve the scriptio continua in the post-processing step of Javanese manuscript image transliteration. The proposed methods use a depth-first search in seeking relevant candidate words to determine whether to merge a new syllable or not. The results of the proposed methods to concatenate 17,687 syllables from the Hamong Tani book using a dictionary containing 49,801 words are found to be satisfactory in terms of computation and accuracy. The accuracy of the implemented greedy and brute-force methods is both 81.64%. However, the greedy-based method is more efficient and has a better performance than the brute-force method. Elsevier 2020-04-28 /pmc/articles/PMC7191592/ /pubmed/32373737 http://dx.doi.org/10.1016/j.heliyon.2020.e03827 Text en © 2020 The Authors http://creativecommons.org/licenses/by-nc-nd/4.0/ This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). |
spellingShingle | Article Widiarti, Anastasia Rita Pulungan, Reza A method for solving scriptio continua in Javanese manuscript transliteration |
title | A method for solving scriptio continua in Javanese manuscript transliteration |
title_full | A method for solving scriptio continua in Javanese manuscript transliteration |
title_fullStr | A method for solving scriptio continua in Javanese manuscript transliteration |
title_full_unstemmed | A method for solving scriptio continua in Javanese manuscript transliteration |
title_short | A method for solving scriptio continua in Javanese manuscript transliteration |
title_sort | method for solving scriptio continua in javanese manuscript transliteration |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7191592/ https://www.ncbi.nlm.nih.gov/pubmed/32373737 http://dx.doi.org/10.1016/j.heliyon.2020.e03827 |
work_keys_str_mv | AT widiartianastasiarita amethodforsolvingscriptiocontinuainjavanesemanuscripttransliteration AT pulunganreza amethodforsolvingscriptiocontinuainjavanesemanuscripttransliteration AT widiartianastasiarita methodforsolvingscriptiocontinuainjavanesemanuscripttransliteration AT pulunganreza methodforsolvingscriptiocontinuainjavanesemanuscripttransliteration |