Cargando…

Thai Word Segmentation with a Brain-Inspired Sparse Distributed Representations Learning Memory

Word segmentation is necessary for many natural language processing, especially Thai language, that is, unsegmented words. However, wrong segmentation causes terrible performance in the final result. In this study, we propose two new brain-inspired methods based on Hawkins' approach to address...

Descripción completa

Detalles Bibliográficos
Autores principales: Soisoonthorn, Thasayu, Unger, Herwig, Maliyaem, Maleerat
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10241586/
https://www.ncbi.nlm.nih.gov/pubmed/37284054
http://dx.doi.org/10.1155/2023/8592214
_version_ 1785054017942978560
author Soisoonthorn, Thasayu
Unger, Herwig
Maliyaem, Maleerat
author_facet Soisoonthorn, Thasayu
Unger, Herwig
Maliyaem, Maleerat
author_sort Soisoonthorn, Thasayu
collection PubMed
description Word segmentation is necessary for many natural language processing, especially Thai language, that is, unsegmented words. However, wrong segmentation causes terrible performance in the final result. In this study, we propose two new brain-inspired methods based on Hawkins' approach to address Thai word segmentation. Sparse Distributed Representations (SDRs) are used to model the neocortex structure of the brain to store and transfer information. The first proposed method, THDICTSDR, improves the dictionary-based approach by utilizing SDRs to learn the surrounding context and combine with n-gram to select the correct word. The second method uses SDRs instead of a dictionary and is called THSDR. The evaluation uses the BEST2010 and LST20 standard datasets for segmentation words by comparing them with the longest matching, newmm, and Deepcut, which is state-of-the-art in the deep learning approach. The result shows that the first method provides the accuracy, and performances are significantly better than other dictionary bases. The first new method can achieve F1-Score at 95.60%, comparable to the state-of-the-art and Deepcut F1-Score at 96.34%. However, it provides a better performance F1-Score at 96.78% in learning all vocabularies. In addition, it can achieve 99.48% F1-Score beyond Deepcut 97.65% in case of all sentences being learnt. The second method has fault tolerance to noise and provides overall result over deep learning in all cases.
format Online
Article
Text
id pubmed-10241586
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Hindawi
record_format MEDLINE/PubMed
spelling pubmed-102415862023-06-06 Thai Word Segmentation with a Brain-Inspired Sparse Distributed Representations Learning Memory Soisoonthorn, Thasayu Unger, Herwig Maliyaem, Maleerat Comput Intell Neurosci Research Article Word segmentation is necessary for many natural language processing, especially Thai language, that is, unsegmented words. However, wrong segmentation causes terrible performance in the final result. In this study, we propose two new brain-inspired methods based on Hawkins' approach to address Thai word segmentation. Sparse Distributed Representations (SDRs) are used to model the neocortex structure of the brain to store and transfer information. The first proposed method, THDICTSDR, improves the dictionary-based approach by utilizing SDRs to learn the surrounding context and combine with n-gram to select the correct word. The second method uses SDRs instead of a dictionary and is called THSDR. The evaluation uses the BEST2010 and LST20 standard datasets for segmentation words by comparing them with the longest matching, newmm, and Deepcut, which is state-of-the-art in the deep learning approach. The result shows that the first method provides the accuracy, and performances are significantly better than other dictionary bases. The first new method can achieve F1-Score at 95.60%, comparable to the state-of-the-art and Deepcut F1-Score at 96.34%. However, it provides a better performance F1-Score at 96.78% in learning all vocabularies. In addition, it can achieve 99.48% F1-Score beyond Deepcut 97.65% in case of all sentences being learnt. The second method has fault tolerance to noise and provides overall result over deep learning in all cases. Hindawi 2023-05-29 /pmc/articles/PMC10241586/ /pubmed/37284054 http://dx.doi.org/10.1155/2023/8592214 Text en Copyright © 2023 Thasayu Soisoonthorn et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Soisoonthorn, Thasayu
Unger, Herwig
Maliyaem, Maleerat
Thai Word Segmentation with a Brain-Inspired Sparse Distributed Representations Learning Memory
title Thai Word Segmentation with a Brain-Inspired Sparse Distributed Representations Learning Memory
title_full Thai Word Segmentation with a Brain-Inspired Sparse Distributed Representations Learning Memory
title_fullStr Thai Word Segmentation with a Brain-Inspired Sparse Distributed Representations Learning Memory
title_full_unstemmed Thai Word Segmentation with a Brain-Inspired Sparse Distributed Representations Learning Memory
title_short Thai Word Segmentation with a Brain-Inspired Sparse Distributed Representations Learning Memory
title_sort thai word segmentation with a brain-inspired sparse distributed representations learning memory
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10241586/
https://www.ncbi.nlm.nih.gov/pubmed/37284054
http://dx.doi.org/10.1155/2023/8592214
work_keys_str_mv AT soisoonthornthasayu thaiwordsegmentationwithabraininspiredsparsedistributedrepresentationslearningmemory
AT ungerherwig thaiwordsegmentationwithabraininspiredsparsedistributedrepresentationslearningmemory
AT maliyaemmaleerat thaiwordsegmentationwithabraininspiredsparsedistributedrepresentationslearningmemory