Cargando…
Thai Word Segmentation with a Brain-Inspired Sparse Distributed Representations Learning Memory
Word segmentation is necessary for many natural language processing, especially Thai language, that is, unsegmented words. However, wrong segmentation causes terrible performance in the final result. In this study, we propose two new brain-inspired methods based on Hawkins' approach to address...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Hindawi
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10241586/ https://www.ncbi.nlm.nih.gov/pubmed/37284054 http://dx.doi.org/10.1155/2023/8592214 |
_version_ | 1785054017942978560 |
---|---|
author | Soisoonthorn, Thasayu Unger, Herwig Maliyaem, Maleerat |
author_facet | Soisoonthorn, Thasayu Unger, Herwig Maliyaem, Maleerat |
author_sort | Soisoonthorn, Thasayu |
collection | PubMed |
description | Word segmentation is necessary for many natural language processing, especially Thai language, that is, unsegmented words. However, wrong segmentation causes terrible performance in the final result. In this study, we propose two new brain-inspired methods based on Hawkins' approach to address Thai word segmentation. Sparse Distributed Representations (SDRs) are used to model the neocortex structure of the brain to store and transfer information. The first proposed method, THDICTSDR, improves the dictionary-based approach by utilizing SDRs to learn the surrounding context and combine with n-gram to select the correct word. The second method uses SDRs instead of a dictionary and is called THSDR. The evaluation uses the BEST2010 and LST20 standard datasets for segmentation words by comparing them with the longest matching, newmm, and Deepcut, which is state-of-the-art in the deep learning approach. The result shows that the first method provides the accuracy, and performances are significantly better than other dictionary bases. The first new method can achieve F1-Score at 95.60%, comparable to the state-of-the-art and Deepcut F1-Score at 96.34%. However, it provides a better performance F1-Score at 96.78% in learning all vocabularies. In addition, it can achieve 99.48% F1-Score beyond Deepcut 97.65% in case of all sentences being learnt. The second method has fault tolerance to noise and provides overall result over deep learning in all cases. |
format | Online Article Text |
id | pubmed-10241586 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Hindawi |
record_format | MEDLINE/PubMed |
spelling | pubmed-102415862023-06-06 Thai Word Segmentation with a Brain-Inspired Sparse Distributed Representations Learning Memory Soisoonthorn, Thasayu Unger, Herwig Maliyaem, Maleerat Comput Intell Neurosci Research Article Word segmentation is necessary for many natural language processing, especially Thai language, that is, unsegmented words. However, wrong segmentation causes terrible performance in the final result. In this study, we propose two new brain-inspired methods based on Hawkins' approach to address Thai word segmentation. Sparse Distributed Representations (SDRs) are used to model the neocortex structure of the brain to store and transfer information. The first proposed method, THDICTSDR, improves the dictionary-based approach by utilizing SDRs to learn the surrounding context and combine with n-gram to select the correct word. The second method uses SDRs instead of a dictionary and is called THSDR. The evaluation uses the BEST2010 and LST20 standard datasets for segmentation words by comparing them with the longest matching, newmm, and Deepcut, which is state-of-the-art in the deep learning approach. The result shows that the first method provides the accuracy, and performances are significantly better than other dictionary bases. The first new method can achieve F1-Score at 95.60%, comparable to the state-of-the-art and Deepcut F1-Score at 96.34%. However, it provides a better performance F1-Score at 96.78% in learning all vocabularies. In addition, it can achieve 99.48% F1-Score beyond Deepcut 97.65% in case of all sentences being learnt. The second method has fault tolerance to noise and provides overall result over deep learning in all cases. Hindawi 2023-05-29 /pmc/articles/PMC10241586/ /pubmed/37284054 http://dx.doi.org/10.1155/2023/8592214 Text en Copyright © 2023 Thasayu Soisoonthorn et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Soisoonthorn, Thasayu Unger, Herwig Maliyaem, Maleerat Thai Word Segmentation with a Brain-Inspired Sparse Distributed Representations Learning Memory |
title | Thai Word Segmentation with a Brain-Inspired Sparse Distributed Representations Learning Memory |
title_full | Thai Word Segmentation with a Brain-Inspired Sparse Distributed Representations Learning Memory |
title_fullStr | Thai Word Segmentation with a Brain-Inspired Sparse Distributed Representations Learning Memory |
title_full_unstemmed | Thai Word Segmentation with a Brain-Inspired Sparse Distributed Representations Learning Memory |
title_short | Thai Word Segmentation with a Brain-Inspired Sparse Distributed Representations Learning Memory |
title_sort | thai word segmentation with a brain-inspired sparse distributed representations learning memory |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10241586/ https://www.ncbi.nlm.nih.gov/pubmed/37284054 http://dx.doi.org/10.1155/2023/8592214 |
work_keys_str_mv | AT soisoonthornthasayu thaiwordsegmentationwithabraininspiredsparsedistributedrepresentationslearningmemory AT ungerherwig thaiwordsegmentationwithabraininspiredsparsedistributedrepresentationslearningmemory AT maliyaemmaleerat thaiwordsegmentationwithabraininspiredsparsedistributedrepresentationslearningmemory |