Cargando…

An Automated Toxicity Classification on Social Media Using LSTM and Word Embedding

The automated identification of toxicity in texts is a crucial area in text analysis since the social media world is replete with unfiltered content that ranges from mildly abusive to downright hateful. Researchers have found an unintended bias and unfairness caused by training datasets, which cause...

Descripción completa

Detalles Bibliográficos
Autores principales:	Alsharef, Ahmad, Aggarwal, Karan, Sonia, Koundal, Deepika, Alyami, Hashem, Ameyed, Darine
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Hindawi 2022
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8863472/ https://www.ncbi.nlm.nih.gov/pubmed/35211168 http://dx.doi.org/10.1155/2022/8467349

_version_	1784655247929507840
author	Alsharef, Ahmad Aggarwal, Karan Sonia, Koundal, Deepika Alyami, Hashem Ameyed, Darine
author_facet	Alsharef, Ahmad Aggarwal, Karan Sonia, Koundal, Deepika Alyami, Hashem Ameyed, Darine
author_sort	Alsharef, Ahmad
collection	PubMed
description	The automated identification of toxicity in texts is a crucial area in text analysis since the social media world is replete with unfiltered content that ranges from mildly abusive to downright hateful. Researchers have found an unintended bias and unfairness caused by training datasets, which caused an inaccurate classification of toxic words in context. In this paper, several approaches for locating toxicity in texts are assessed and presented aiming to enhance the overall quality of text classification. General unsupervised methods were used depending on the state-of-art models and external embeddings to improve the accuracy while relieving bias and enhancing F1-score. Suggested approaches used a combination of long short-term memory (LSTM) deep learning model with Glove word embeddings and LSTM with word embeddings generated by the Bidirectional Encoder Representations from Transformers (BERT), respectively. These models were trained and tested on large secondary qualitative data containing a large number of comments classified as toxic or not. Results found that acceptable accuracy of 94% and an F1-score of 0.89 were achieved using LSTM with BERT word embeddings in the binary classification of comments (toxic and nontoxic). A combination of LSTM and BERT performed better than both LSTM unaccompanied and LSTM with Glove word embedding. This paper tries to solve the problem of classifying comments with high accuracy by pertaining models with larger corpora of text (high-quality word embedding) rather than the training data solely.
format	Online Article Text
id	pubmed-8863472
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Hindawi
record_format	MEDLINE/PubMed
spelling	pubmed-88634722022-02-23 An Automated Toxicity Classification on Social Media Using LSTM and Word Embedding Alsharef, Ahmad Aggarwal, Karan Sonia, Koundal, Deepika Alyami, Hashem Ameyed, Darine Comput Intell Neurosci Research Article The automated identification of toxicity in texts is a crucial area in text analysis since the social media world is replete with unfiltered content that ranges from mildly abusive to downright hateful. Researchers have found an unintended bias and unfairness caused by training datasets, which caused an inaccurate classification of toxic words in context. In this paper, several approaches for locating toxicity in texts are assessed and presented aiming to enhance the overall quality of text classification. General unsupervised methods were used depending on the state-of-art models and external embeddings to improve the accuracy while relieving bias and enhancing F1-score. Suggested approaches used a combination of long short-term memory (LSTM) deep learning model with Glove word embeddings and LSTM with word embeddings generated by the Bidirectional Encoder Representations from Transformers (BERT), respectively. These models were trained and tested on large secondary qualitative data containing a large number of comments classified as toxic or not. Results found that acceptable accuracy of 94% and an F1-score of 0.89 were achieved using LSTM with BERT word embeddings in the binary classification of comments (toxic and nontoxic). A combination of LSTM and BERT performed better than both LSTM unaccompanied and LSTM with Glove word embedding. This paper tries to solve the problem of classifying comments with high accuracy by pertaining models with larger corpora of text (high-quality word embedding) rather than the training data solely. Hindawi 2022-02-15 /pmc/articles/PMC8863472/ /pubmed/35211168 http://dx.doi.org/10.1155/2022/8467349 Text en Copyright © 2022 Ahmad Alsharef et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Alsharef, Ahmad Aggarwal, Karan Sonia, Koundal, Deepika Alyami, Hashem Ameyed, Darine An Automated Toxicity Classification on Social Media Using LSTM and Word Embedding
title	An Automated Toxicity Classification on Social Media Using LSTM and Word Embedding
title_full	An Automated Toxicity Classification on Social Media Using LSTM and Word Embedding
title_fullStr	An Automated Toxicity Classification on Social Media Using LSTM and Word Embedding
title_full_unstemmed	An Automated Toxicity Classification on Social Media Using LSTM and Word Embedding
title_short	An Automated Toxicity Classification on Social Media Using LSTM and Word Embedding
title_sort	automated toxicity classification on social media using lstm and word embedding
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8863472/ https://www.ncbi.nlm.nih.gov/pubmed/35211168 http://dx.doi.org/10.1155/2022/8467349
work_keys_str_mv	AT alsharefahmad anautomatedtoxicityclassificationonsocialmediausinglstmandwordembedding AT aggarwalkaran anautomatedtoxicityclassificationonsocialmediausinglstmandwordembedding AT sonia anautomatedtoxicityclassificationonsocialmediausinglstmandwordembedding AT koundaldeepika anautomatedtoxicityclassificationonsocialmediausinglstmandwordembedding AT alyamihashem anautomatedtoxicityclassificationonsocialmediausinglstmandwordembedding AT ameyeddarine anautomatedtoxicityclassificationonsocialmediausinglstmandwordembedding AT alsharefahmad automatedtoxicityclassificationonsocialmediausinglstmandwordembedding AT aggarwalkaran automatedtoxicityclassificationonsocialmediausinglstmandwordembedding AT sonia automatedtoxicityclassificationonsocialmediausinglstmandwordembedding AT koundaldeepika automatedtoxicityclassificationonsocialmediausinglstmandwordembedding AT alyamihashem automatedtoxicityclassificationonsocialmediausinglstmandwordembedding AT ameyeddarine automatedtoxicityclassificationonsocialmediausinglstmandwordembedding

An Automated Toxicity Classification on Social Media Using LSTM and Word Embedding

Ejemplares similares