Cargando…

Optical character recognition system for Baybayin scripts using support vector machine

In 2018, the Philippine Congress signed House Bill 1022 declaring the Baybayin script as the Philippines’ national writing system. In this regard, it is highly probable that the Baybayin and Latin scripts would appear in a single document. In this work, we propose a system that discriminates the cha...

Descripción completa

Detalles Bibliográficos
Autores principales: Pino, Rodney, Mendoza, Renier, Sambayan, Rachelle
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7959605/
https://www.ncbi.nlm.nih.gov/pubmed/33817010
http://dx.doi.org/10.7717/peerj-cs.360
_version_ 1783664985789956096
author Pino, Rodney
Mendoza, Renier
Sambayan, Rachelle
author_facet Pino, Rodney
Mendoza, Renier
Sambayan, Rachelle
author_sort Pino, Rodney
collection PubMed
description In 2018, the Philippine Congress signed House Bill 1022 declaring the Baybayin script as the Philippines’ national writing system. In this regard, it is highly probable that the Baybayin and Latin scripts would appear in a single document. In this work, we propose a system that discriminates the characters of both scripts. The proposed system considers the normalization of an individual character to identify if it belongs to Baybayin or Latin script and further classify them as to what unit they represent. This gives us four classification problems, namely: (1) Baybayin and Latin script recognition, (2) Baybayin character classification, (3) Latin character classification, and (4) Baybayin diacritical marks classification. To the best of our knowledge, this is the first study that makes use of Support Vector Machine (SVM) for Baybayin script recognition. This work also provides a new dataset for Baybayin, its diacritics, and Latin characters. Classification problems (1) and (4) use binary SVM while (2) and (3) apply the multiclass SVM classification. On average, our numerical experiments yield satisfactory results: (1) has 98.5% accuracy, 98.5% precision, 98.49% recall, and 98.5% F1 Score; (2) has 96.51% accuracy, 95.62% precision, 95.61% recall, and 95.62% F1 Score; (3) has 95.8% accuracy, 95.85% precision, 95.8% recall, and 95.83% F1 Score; and (4) has 100% accuracy, 100% precision, 100% recall, and 100% F1 Score.
format Online
Article
Text
id pubmed-7959605
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-79596052021-04-02 Optical character recognition system for Baybayin scripts using support vector machine Pino, Rodney Mendoza, Renier Sambayan, Rachelle PeerJ Comput Sci Artificial Intelligence In 2018, the Philippine Congress signed House Bill 1022 declaring the Baybayin script as the Philippines’ national writing system. In this regard, it is highly probable that the Baybayin and Latin scripts would appear in a single document. In this work, we propose a system that discriminates the characters of both scripts. The proposed system considers the normalization of an individual character to identify if it belongs to Baybayin or Latin script and further classify them as to what unit they represent. This gives us four classification problems, namely: (1) Baybayin and Latin script recognition, (2) Baybayin character classification, (3) Latin character classification, and (4) Baybayin diacritical marks classification. To the best of our knowledge, this is the first study that makes use of Support Vector Machine (SVM) for Baybayin script recognition. This work also provides a new dataset for Baybayin, its diacritics, and Latin characters. Classification problems (1) and (4) use binary SVM while (2) and (3) apply the multiclass SVM classification. On average, our numerical experiments yield satisfactory results: (1) has 98.5% accuracy, 98.5% precision, 98.49% recall, and 98.5% F1 Score; (2) has 96.51% accuracy, 95.62% precision, 95.61% recall, and 95.62% F1 Score; (3) has 95.8% accuracy, 95.85% precision, 95.8% recall, and 95.83% F1 Score; and (4) has 100% accuracy, 100% precision, 100% recall, and 100% F1 Score. PeerJ Inc. 2021-02-15 /pmc/articles/PMC7959605/ /pubmed/33817010 http://dx.doi.org/10.7717/peerj-cs.360 Text en ©2021 Pino et al. https://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle Artificial Intelligence
Pino, Rodney
Mendoza, Renier
Sambayan, Rachelle
Optical character recognition system for Baybayin scripts using support vector machine
title Optical character recognition system for Baybayin scripts using support vector machine
title_full Optical character recognition system for Baybayin scripts using support vector machine
title_fullStr Optical character recognition system for Baybayin scripts using support vector machine
title_full_unstemmed Optical character recognition system for Baybayin scripts using support vector machine
title_short Optical character recognition system for Baybayin scripts using support vector machine
title_sort optical character recognition system for baybayin scripts using support vector machine
topic Artificial Intelligence
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7959605/
https://www.ncbi.nlm.nih.gov/pubmed/33817010
http://dx.doi.org/10.7717/peerj-cs.360
work_keys_str_mv AT pinorodney opticalcharacterrecognitionsystemforbaybayinscriptsusingsupportvectormachine
AT mendozarenier opticalcharacterrecognitionsystemforbaybayinscriptsusingsupportvectormachine
AT sambayanrachelle opticalcharacterrecognitionsystemforbaybayinscriptsusingsupportvectormachine