Cargando…

Mixed Script Identification Using Automated DNN Hyperparameter Optimization

Mixed script identification is a hindrance for automated natural language processing systems. Mixing cursive scripts of different languages is a challenge because NLP methods like POS tagging and word sense disambiguation suffer from noisy text. This study tackles the challenge of mixed script ident...

Descripción completa

Detalles Bibliográficos
Autores principales: Yasir, Muhammad, Chen, Li, Khatoon, Amna, Malik, Muhammad Amir, Abid, Fazeel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8683192/
https://www.ncbi.nlm.nih.gov/pubmed/34925496
http://dx.doi.org/10.1155/2021/8415333
_version_ 1784617359922692096
author Yasir, Muhammad
Chen, Li
Khatoon, Amna
Malik, Muhammad Amir
Abid, Fazeel
author_facet Yasir, Muhammad
Chen, Li
Khatoon, Amna
Malik, Muhammad Amir
Abid, Fazeel
author_sort Yasir, Muhammad
collection PubMed
description Mixed script identification is a hindrance for automated natural language processing systems. Mixing cursive scripts of different languages is a challenge because NLP methods like POS tagging and word sense disambiguation suffer from noisy text. This study tackles the challenge of mixed script identification for mixed-code dataset consisting of Roman Urdu, Hindi, Saraiki, Bengali, and English. The language identification model is trained using word vectorization and RNN variants. Moreover, through experimental investigation, different architectures are optimized for the task associated with Long Short-Term Memory (LSTM), Bidirectional LSTM, Gated Recurrent Unit (GRU), and Bidirectional Gated Recurrent Unit (Bi-GRU). Experimentation achieved the highest accuracy of 90.17 for Bi-GRU, applying learned word class features along with embedding with GloVe. Moreover, this study addresses the issues related to multilingual environments, such as Roman words merged with English characters, generative spellings, and phonetic typing.
format Online
Article
Text
id pubmed-8683192
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Hindawi
record_format MEDLINE/PubMed
spelling pubmed-86831922021-12-18 Mixed Script Identification Using Automated DNN Hyperparameter Optimization Yasir, Muhammad Chen, Li Khatoon, Amna Malik, Muhammad Amir Abid, Fazeel Comput Intell Neurosci Research Article Mixed script identification is a hindrance for automated natural language processing systems. Mixing cursive scripts of different languages is a challenge because NLP methods like POS tagging and word sense disambiguation suffer from noisy text. This study tackles the challenge of mixed script identification for mixed-code dataset consisting of Roman Urdu, Hindi, Saraiki, Bengali, and English. The language identification model is trained using word vectorization and RNN variants. Moreover, through experimental investigation, different architectures are optimized for the task associated with Long Short-Term Memory (LSTM), Bidirectional LSTM, Gated Recurrent Unit (GRU), and Bidirectional Gated Recurrent Unit (Bi-GRU). Experimentation achieved the highest accuracy of 90.17 for Bi-GRU, applying learned word class features along with embedding with GloVe. Moreover, this study addresses the issues related to multilingual environments, such as Roman words merged with English characters, generative spellings, and phonetic typing. Hindawi 2021-12-10 /pmc/articles/PMC8683192/ /pubmed/34925496 http://dx.doi.org/10.1155/2021/8415333 Text en Copyright © 2021 Muhammad Yasir et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Yasir, Muhammad
Chen, Li
Khatoon, Amna
Malik, Muhammad Amir
Abid, Fazeel
Mixed Script Identification Using Automated DNN Hyperparameter Optimization
title Mixed Script Identification Using Automated DNN Hyperparameter Optimization
title_full Mixed Script Identification Using Automated DNN Hyperparameter Optimization
title_fullStr Mixed Script Identification Using Automated DNN Hyperparameter Optimization
title_full_unstemmed Mixed Script Identification Using Automated DNN Hyperparameter Optimization
title_short Mixed Script Identification Using Automated DNN Hyperparameter Optimization
title_sort mixed script identification using automated dnn hyperparameter optimization
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8683192/
https://www.ncbi.nlm.nih.gov/pubmed/34925496
http://dx.doi.org/10.1155/2021/8415333
work_keys_str_mv AT yasirmuhammad mixedscriptidentificationusingautomateddnnhyperparameteroptimization
AT chenli mixedscriptidentificationusingautomateddnnhyperparameteroptimization
AT khatoonamna mixedscriptidentificationusingautomateddnnhyperparameteroptimization
AT malikmuhammadamir mixedscriptidentificationusingautomateddnnhyperparameteroptimization
AT abidfazeel mixedscriptidentificationusingautomateddnnhyperparameteroptimization