Cargando…

Part-of-Speech tagging enhancement to natural language processing for Thai wh-question classification with deep learning()

Question classification is a crucial task for answer selection. Question classification could help define the structure of question sentences generated by features extraction from a sentence, such as who, when, where, and how. In this paper, we proposed a methodology to improve question classificati...

Descripción completa

Detalles Bibliográficos
Autores principales: Chotirat, Saranlita, Meesad, Phayung
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8554172/
https://www.ncbi.nlm.nih.gov/pubmed/34746470
http://dx.doi.org/10.1016/j.heliyon.2021.e08216
_version_ 1784591737259294720
author Chotirat, Saranlita
Meesad, Phayung
author_facet Chotirat, Saranlita
Meesad, Phayung
author_sort Chotirat, Saranlita
collection PubMed
description Question classification is a crucial task for answer selection. Question classification could help define the structure of question sentences generated by features extraction from a sentence, such as who, when, where, and how. In this paper, we proposed a methodology to improve question classification from texts by using feature selection and word embedding techniques. We conducted several experiments to evaluate the performance of the proposed methodology using two different datasets (TREC-6 dataset and Thai sentence dataset) with term frequency and combined term frequency-inverse document frequency including Unigram, Unigram+Bigram, and Unigram + Trigram as features. Machine learning models based on traditional and deep learning classifiers were used. The traditional classification models were Multinomial Naïve Bayes, Logistic Regression, and Support Vector Machine. The deep learning techniques were Bidirectional Long Short-Term Memory (BiLSTM), Convolutional Neural Networks (CNN), and Hybrid model, which combined CNN and BiLSTM model. The experiment results showed that our methodology based on Part-of-Speech (POS) tagging was the best to improve question classification accuracy. The classifying question categories achieved with average micro [Formula: see text]-score of 0.98 when applied SVM model on adding all POS tags in the TREC-6 dataset. The highest average micro [Formula: see text]-score achieved 0.8 when applied GloVe by using CNN model on adding focusing tags in the Thai sentences dataset.
format Online
Article
Text
id pubmed-8554172
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-85541722021-11-05 Part-of-Speech tagging enhancement to natural language processing for Thai wh-question classification with deep learning() Chotirat, Saranlita Meesad, Phayung Heliyon Research Article Question classification is a crucial task for answer selection. Question classification could help define the structure of question sentences generated by features extraction from a sentence, such as who, when, where, and how. In this paper, we proposed a methodology to improve question classification from texts by using feature selection and word embedding techniques. We conducted several experiments to evaluate the performance of the proposed methodology using two different datasets (TREC-6 dataset and Thai sentence dataset) with term frequency and combined term frequency-inverse document frequency including Unigram, Unigram+Bigram, and Unigram + Trigram as features. Machine learning models based on traditional and deep learning classifiers were used. The traditional classification models were Multinomial Naïve Bayes, Logistic Regression, and Support Vector Machine. The deep learning techniques were Bidirectional Long Short-Term Memory (BiLSTM), Convolutional Neural Networks (CNN), and Hybrid model, which combined CNN and BiLSTM model. The experiment results showed that our methodology based on Part-of-Speech (POS) tagging was the best to improve question classification accuracy. The classifying question categories achieved with average micro [Formula: see text]-score of 0.98 when applied SVM model on adding all POS tags in the TREC-6 dataset. The highest average micro [Formula: see text]-score achieved 0.8 when applied GloVe by using CNN model on adding focusing tags in the Thai sentences dataset. Elsevier 2021-10-19 /pmc/articles/PMC8554172/ /pubmed/34746470 http://dx.doi.org/10.1016/j.heliyon.2021.e08216 Text en © 2021 The Authors https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Research Article
Chotirat, Saranlita
Meesad, Phayung
Part-of-Speech tagging enhancement to natural language processing for Thai wh-question classification with deep learning()
title Part-of-Speech tagging enhancement to natural language processing for Thai wh-question classification with deep learning()
title_full Part-of-Speech tagging enhancement to natural language processing for Thai wh-question classification with deep learning()
title_fullStr Part-of-Speech tagging enhancement to natural language processing for Thai wh-question classification with deep learning()
title_full_unstemmed Part-of-Speech tagging enhancement to natural language processing for Thai wh-question classification with deep learning()
title_short Part-of-Speech tagging enhancement to natural language processing for Thai wh-question classification with deep learning()
title_sort part-of-speech tagging enhancement to natural language processing for thai wh-question classification with deep learning()
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8554172/
https://www.ncbi.nlm.nih.gov/pubmed/34746470
http://dx.doi.org/10.1016/j.heliyon.2021.e08216
work_keys_str_mv AT chotiratsaranlita partofspeechtaggingenhancementtonaturallanguageprocessingforthaiwhquestionclassificationwithdeeplearning
AT meesadphayung partofspeechtaggingenhancementtonaturallanguageprocessingforthaiwhquestionclassificationwithdeeplearning