Cargando…

Mixed Script Identification Using Automated DNN Hyperparameter Optimization

Mixed script identification is a hindrance for automated natural language processing systems. Mixing cursive scripts of different languages is a challenge because NLP methods like POS tagging and word sense disambiguation suffer from noisy text. This study tackles the challenge of mixed script ident...

Descripción completa

Detalles Bibliográficos
Autores principales:	Yasir, Muhammad, Chen, Li, Khatoon, Amna, Malik, Muhammad Amir, Abid, Fazeel
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Hindawi 2021
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8683192/ https://www.ncbi.nlm.nih.gov/pubmed/34925496 http://dx.doi.org/10.1155/2021/8415333

_version_	1784617359922692096
author	Yasir, Muhammad Chen, Li Khatoon, Amna Malik, Muhammad Amir Abid, Fazeel
author_facet	Yasir, Muhammad Chen, Li Khatoon, Amna Malik, Muhammad Amir Abid, Fazeel
author_sort	Yasir, Muhammad
collection	PubMed
description	Mixed script identification is a hindrance for automated natural language processing systems. Mixing cursive scripts of different languages is a challenge because NLP methods like POS tagging and word sense disambiguation suffer from noisy text. This study tackles the challenge of mixed script identification for mixed-code dataset consisting of Roman Urdu, Hindi, Saraiki, Bengali, and English. The language identification model is trained using word vectorization and RNN variants. Moreover, through experimental investigation, different architectures are optimized for the task associated with Long Short-Term Memory (LSTM), Bidirectional LSTM, Gated Recurrent Unit (GRU), and Bidirectional Gated Recurrent Unit (Bi-GRU). Experimentation achieved the highest accuracy of 90.17 for Bi-GRU, applying learned word class features along with embedding with GloVe. Moreover, this study addresses the issues related to multilingual environments, such as Roman words merged with English characters, generative spellings, and phonetic typing.
format	Online Article Text
id	pubmed-8683192
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Hindawi
record_format	MEDLINE/PubMed
spelling	pubmed-86831922021-12-18 Mixed Script Identification Using Automated DNN Hyperparameter Optimization Yasir, Muhammad Chen, Li Khatoon, Amna Malik, Muhammad Amir Abid, Fazeel Comput Intell Neurosci Research Article Mixed script identification is a hindrance for automated natural language processing systems. Mixing cursive scripts of different languages is a challenge because NLP methods like POS tagging and word sense disambiguation suffer from noisy text. This study tackles the challenge of mixed script identification for mixed-code dataset consisting of Roman Urdu, Hindi, Saraiki, Bengali, and English. The language identification model is trained using word vectorization and RNN variants. Moreover, through experimental investigation, different architectures are optimized for the task associated with Long Short-Term Memory (LSTM), Bidirectional LSTM, Gated Recurrent Unit (GRU), and Bidirectional Gated Recurrent Unit (Bi-GRU). Experimentation achieved the highest accuracy of 90.17 for Bi-GRU, applying learned word class features along with embedding with GloVe. Moreover, this study addresses the issues related to multilingual environments, such as Roman words merged with English characters, generative spellings, and phonetic typing. Hindawi 2021-12-10 /pmc/articles/PMC8683192/ /pubmed/34925496 http://dx.doi.org/10.1155/2021/8415333 Text en Copyright © 2021 Muhammad Yasir et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Yasir, Muhammad Chen, Li Khatoon, Amna Malik, Muhammad Amir Abid, Fazeel Mixed Script Identification Using Automated DNN Hyperparameter Optimization
title	Mixed Script Identification Using Automated DNN Hyperparameter Optimization
title_full	Mixed Script Identification Using Automated DNN Hyperparameter Optimization
title_fullStr	Mixed Script Identification Using Automated DNN Hyperparameter Optimization
title_full_unstemmed	Mixed Script Identification Using Automated DNN Hyperparameter Optimization
title_short	Mixed Script Identification Using Automated DNN Hyperparameter Optimization
title_sort	mixed script identification using automated dnn hyperparameter optimization
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8683192/ https://www.ncbi.nlm.nih.gov/pubmed/34925496 http://dx.doi.org/10.1155/2021/8415333
work_keys_str_mv	AT yasirmuhammad mixedscriptidentificationusingautomateddnnhyperparameteroptimization AT chenli mixedscriptidentificationusingautomateddnnhyperparameteroptimization AT khatoonamna mixedscriptidentificationusingautomateddnnhyperparameteroptimization AT malikmuhammadamir mixedscriptidentificationusingautomateddnnhyperparameteroptimization AT abidfazeel mixedscriptidentificationusingautomateddnnhyperparameteroptimization

Mixed Script Identification Using Automated DNN Hyperparameter Optimization

Ejemplares similares