Cargando…

Event classification from the Urdu language text on social media

The real-time availability of the Internet has engaged millions of users around the world. The usage of regional languages is being preferred for effective and ease of communication that is causing multilingual data on social networks and news channels. People share ideas, opinions, and events that...

Descripción completa

Detalles Bibliográficos
Autores principales: Awan, Malik Daler Ali, Kajla, Nadeem Iqbal, Firdous, Amnah, Husnain, Mujtaba, Missen, Malik Muhammad Saad
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8627225/
https://www.ncbi.nlm.nih.gov/pubmed/34901431
http://dx.doi.org/10.7717/peerj-cs.775
_version_ 1784606810654638080
author Awan, Malik Daler Ali
Kajla, Nadeem Iqbal
Firdous, Amnah
Husnain, Mujtaba
Missen, Malik Muhammad Saad
author_facet Awan, Malik Daler Ali
Kajla, Nadeem Iqbal
Firdous, Amnah
Husnain, Mujtaba
Missen, Malik Muhammad Saad
author_sort Awan, Malik Daler Ali
collection PubMed
description The real-time availability of the Internet has engaged millions of users around the world. The usage of regional languages is being preferred for effective and ease of communication that is causing multilingual data on social networks and news channels. People share ideas, opinions, and events that are happening globally i.e., sports, inflation, protest, explosion, and sexual assault, etc. in regional (local) languages on social media. Extraction and classification of events from multilingual data have become bottlenecks because of resource lacking. In this research paper, we presented the event classification task for the Urdu language text existing on social media and the news channels by using machine learning classifiers. The dataset contains more than 0.1 million (102,962) labeled instances of twelve (12) different types of events. The title, its length, and the last four words of a sentence are used as features to classify the events. The Term Frequency-Inverse Document Frequency (tf-idf) showed the best results as a feature vector to evaluate the performance of the six popular machine learning classifiers. Random Forest (RF) and K-Nearest Neighbor (KNN) are among the classifiers that out-performed among other classifiers by achieving 98.00% and 99.00% accuracy, respectively. The novelty lies in the fact that the features aforementioned are not applied, up to the best of our knowledge, in the event extraction of the text written in the Urdu language.
format Online
Article
Text
id pubmed-8627225
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-86272252021-12-10 Event classification from the Urdu language text on social media Awan, Malik Daler Ali Kajla, Nadeem Iqbal Firdous, Amnah Husnain, Mujtaba Missen, Malik Muhammad Saad PeerJ Comput Sci Artificial Intelligence The real-time availability of the Internet has engaged millions of users around the world. The usage of regional languages is being preferred for effective and ease of communication that is causing multilingual data on social networks and news channels. People share ideas, opinions, and events that are happening globally i.e., sports, inflation, protest, explosion, and sexual assault, etc. in regional (local) languages on social media. Extraction and classification of events from multilingual data have become bottlenecks because of resource lacking. In this research paper, we presented the event classification task for the Urdu language text existing on social media and the news channels by using machine learning classifiers. The dataset contains more than 0.1 million (102,962) labeled instances of twelve (12) different types of events. The title, its length, and the last four words of a sentence are used as features to classify the events. The Term Frequency-Inverse Document Frequency (tf-idf) showed the best results as a feature vector to evaluate the performance of the six popular machine learning classifiers. Random Forest (RF) and K-Nearest Neighbor (KNN) are among the classifiers that out-performed among other classifiers by achieving 98.00% and 99.00% accuracy, respectively. The novelty lies in the fact that the features aforementioned are not applied, up to the best of our knowledge, in the event extraction of the text written in the Urdu language. PeerJ Inc. 2021-11-18 /pmc/articles/PMC8627225/ /pubmed/34901431 http://dx.doi.org/10.7717/peerj-cs.775 Text en © 2021 Awan et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle Artificial Intelligence
Awan, Malik Daler Ali
Kajla, Nadeem Iqbal
Firdous, Amnah
Husnain, Mujtaba
Missen, Malik Muhammad Saad
Event classification from the Urdu language text on social media
title Event classification from the Urdu language text on social media
title_full Event classification from the Urdu language text on social media
title_fullStr Event classification from the Urdu language text on social media
title_full_unstemmed Event classification from the Urdu language text on social media
title_short Event classification from the Urdu language text on social media
title_sort event classification from the urdu language text on social media
topic Artificial Intelligence
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8627225/
https://www.ncbi.nlm.nih.gov/pubmed/34901431
http://dx.doi.org/10.7717/peerj-cs.775
work_keys_str_mv AT awanmalikdalerali eventclassificationfromtheurdulanguagetextonsocialmedia
AT kajlanadeemiqbal eventclassificationfromtheurdulanguagetextonsocialmedia
AT firdousamnah eventclassificationfromtheurdulanguagetextonsocialmedia
AT husnainmujtaba eventclassificationfromtheurdulanguagetextonsocialmedia
AT missenmalikmuhammadsaad eventclassificationfromtheurdulanguagetextonsocialmedia