Cargando…
Mixed Script Identification Using Automated DNN Hyperparameter Optimization
Mixed script identification is a hindrance for automated natural language processing systems. Mixing cursive scripts of different languages is a challenge because NLP methods like POS tagging and word sense disambiguation suffer from noisy text. This study tackles the challenge of mixed script ident...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Hindawi
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8683192/ https://www.ncbi.nlm.nih.gov/pubmed/34925496 http://dx.doi.org/10.1155/2021/8415333 |
_version_ | 1784617359922692096 |
---|---|
author | Yasir, Muhammad Chen, Li Khatoon, Amna Malik, Muhammad Amir Abid, Fazeel |
author_facet | Yasir, Muhammad Chen, Li Khatoon, Amna Malik, Muhammad Amir Abid, Fazeel |
author_sort | Yasir, Muhammad |
collection | PubMed |
description | Mixed script identification is a hindrance for automated natural language processing systems. Mixing cursive scripts of different languages is a challenge because NLP methods like POS tagging and word sense disambiguation suffer from noisy text. This study tackles the challenge of mixed script identification for mixed-code dataset consisting of Roman Urdu, Hindi, Saraiki, Bengali, and English. The language identification model is trained using word vectorization and RNN variants. Moreover, through experimental investigation, different architectures are optimized for the task associated with Long Short-Term Memory (LSTM), Bidirectional LSTM, Gated Recurrent Unit (GRU), and Bidirectional Gated Recurrent Unit (Bi-GRU). Experimentation achieved the highest accuracy of 90.17 for Bi-GRU, applying learned word class features along with embedding with GloVe. Moreover, this study addresses the issues related to multilingual environments, such as Roman words merged with English characters, generative spellings, and phonetic typing. |
format | Online Article Text |
id | pubmed-8683192 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Hindawi |
record_format | MEDLINE/PubMed |
spelling | pubmed-86831922021-12-18 Mixed Script Identification Using Automated DNN Hyperparameter Optimization Yasir, Muhammad Chen, Li Khatoon, Amna Malik, Muhammad Amir Abid, Fazeel Comput Intell Neurosci Research Article Mixed script identification is a hindrance for automated natural language processing systems. Mixing cursive scripts of different languages is a challenge because NLP methods like POS tagging and word sense disambiguation suffer from noisy text. This study tackles the challenge of mixed script identification for mixed-code dataset consisting of Roman Urdu, Hindi, Saraiki, Bengali, and English. The language identification model is trained using word vectorization and RNN variants. Moreover, through experimental investigation, different architectures are optimized for the task associated with Long Short-Term Memory (LSTM), Bidirectional LSTM, Gated Recurrent Unit (GRU), and Bidirectional Gated Recurrent Unit (Bi-GRU). Experimentation achieved the highest accuracy of 90.17 for Bi-GRU, applying learned word class features along with embedding with GloVe. Moreover, this study addresses the issues related to multilingual environments, such as Roman words merged with English characters, generative spellings, and phonetic typing. Hindawi 2021-12-10 /pmc/articles/PMC8683192/ /pubmed/34925496 http://dx.doi.org/10.1155/2021/8415333 Text en Copyright © 2021 Muhammad Yasir et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Yasir, Muhammad Chen, Li Khatoon, Amna Malik, Muhammad Amir Abid, Fazeel Mixed Script Identification Using Automated DNN Hyperparameter Optimization |
title | Mixed Script Identification Using Automated DNN Hyperparameter Optimization |
title_full | Mixed Script Identification Using Automated DNN Hyperparameter Optimization |
title_fullStr | Mixed Script Identification Using Automated DNN Hyperparameter Optimization |
title_full_unstemmed | Mixed Script Identification Using Automated DNN Hyperparameter Optimization |
title_short | Mixed Script Identification Using Automated DNN Hyperparameter Optimization |
title_sort | mixed script identification using automated dnn hyperparameter optimization |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8683192/ https://www.ncbi.nlm.nih.gov/pubmed/34925496 http://dx.doi.org/10.1155/2021/8415333 |
work_keys_str_mv | AT yasirmuhammad mixedscriptidentificationusingautomateddnnhyperparameteroptimization AT chenli mixedscriptidentificationusingautomateddnnhyperparameteroptimization AT khatoonamna mixedscriptidentificationusingautomateddnnhyperparameteroptimization AT malikmuhammadamir mixedscriptidentificationusingautomateddnnhyperparameteroptimization AT abidfazeel mixedscriptidentificationusingautomateddnnhyperparameteroptimization |