Cargando…
Part-of-Speech tagging enhancement to natural language processing for Thai wh-question classification with deep learning()
Question classification is a crucial task for answer selection. Question classification could help define the structure of question sentences generated by features extraction from a sentence, such as who, when, where, and how. In this paper, we proposed a methodology to improve question classificati...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8554172/ https://www.ncbi.nlm.nih.gov/pubmed/34746470 http://dx.doi.org/10.1016/j.heliyon.2021.e08216 |
_version_ | 1784591737259294720 |
---|---|
author | Chotirat, Saranlita Meesad, Phayung |
author_facet | Chotirat, Saranlita Meesad, Phayung |
author_sort | Chotirat, Saranlita |
collection | PubMed |
description | Question classification is a crucial task for answer selection. Question classification could help define the structure of question sentences generated by features extraction from a sentence, such as who, when, where, and how. In this paper, we proposed a methodology to improve question classification from texts by using feature selection and word embedding techniques. We conducted several experiments to evaluate the performance of the proposed methodology using two different datasets (TREC-6 dataset and Thai sentence dataset) with term frequency and combined term frequency-inverse document frequency including Unigram, Unigram+Bigram, and Unigram + Trigram as features. Machine learning models based on traditional and deep learning classifiers were used. The traditional classification models were Multinomial Naïve Bayes, Logistic Regression, and Support Vector Machine. The deep learning techniques were Bidirectional Long Short-Term Memory (BiLSTM), Convolutional Neural Networks (CNN), and Hybrid model, which combined CNN and BiLSTM model. The experiment results showed that our methodology based on Part-of-Speech (POS) tagging was the best to improve question classification accuracy. The classifying question categories achieved with average micro [Formula: see text]-score of 0.98 when applied SVM model on adding all POS tags in the TREC-6 dataset. The highest average micro [Formula: see text]-score achieved 0.8 when applied GloVe by using CNN model on adding focusing tags in the Thai sentences dataset. |
format | Online Article Text |
id | pubmed-8554172 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Elsevier |
record_format | MEDLINE/PubMed |
spelling | pubmed-85541722021-11-05 Part-of-Speech tagging enhancement to natural language processing for Thai wh-question classification with deep learning() Chotirat, Saranlita Meesad, Phayung Heliyon Research Article Question classification is a crucial task for answer selection. Question classification could help define the structure of question sentences generated by features extraction from a sentence, such as who, when, where, and how. In this paper, we proposed a methodology to improve question classification from texts by using feature selection and word embedding techniques. We conducted several experiments to evaluate the performance of the proposed methodology using two different datasets (TREC-6 dataset and Thai sentence dataset) with term frequency and combined term frequency-inverse document frequency including Unigram, Unigram+Bigram, and Unigram + Trigram as features. Machine learning models based on traditional and deep learning classifiers were used. The traditional classification models were Multinomial Naïve Bayes, Logistic Regression, and Support Vector Machine. The deep learning techniques were Bidirectional Long Short-Term Memory (BiLSTM), Convolutional Neural Networks (CNN), and Hybrid model, which combined CNN and BiLSTM model. The experiment results showed that our methodology based on Part-of-Speech (POS) tagging was the best to improve question classification accuracy. The classifying question categories achieved with average micro [Formula: see text]-score of 0.98 when applied SVM model on adding all POS tags in the TREC-6 dataset. The highest average micro [Formula: see text]-score achieved 0.8 when applied GloVe by using CNN model on adding focusing tags in the Thai sentences dataset. Elsevier 2021-10-19 /pmc/articles/PMC8554172/ /pubmed/34746470 http://dx.doi.org/10.1016/j.heliyon.2021.e08216 Text en © 2021 The Authors https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Research Article Chotirat, Saranlita Meesad, Phayung Part-of-Speech tagging enhancement to natural language processing for Thai wh-question classification with deep learning() |
title | Part-of-Speech tagging enhancement to natural language processing for Thai wh-question classification with deep learning() |
title_full | Part-of-Speech tagging enhancement to natural language processing for Thai wh-question classification with deep learning() |
title_fullStr | Part-of-Speech tagging enhancement to natural language processing for Thai wh-question classification with deep learning() |
title_full_unstemmed | Part-of-Speech tagging enhancement to natural language processing for Thai wh-question classification with deep learning() |
title_short | Part-of-Speech tagging enhancement to natural language processing for Thai wh-question classification with deep learning() |
title_sort | part-of-speech tagging enhancement to natural language processing for thai wh-question classification with deep learning() |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8554172/ https://www.ncbi.nlm.nih.gov/pubmed/34746470 http://dx.doi.org/10.1016/j.heliyon.2021.e08216 |
work_keys_str_mv | AT chotiratsaranlita partofspeechtaggingenhancementtonaturallanguageprocessingforthaiwhquestionclassificationwithdeeplearning AT meesadphayung partofspeechtaggingenhancementtonaturallanguageprocessingforthaiwhquestionclassificationwithdeeplearning |