Cargando…

Asian hate speech detection on Twitter during COVID-19

Coronavirus disease 2019 (COVID-19) started in Wuhan, China, in late 2019, and after being utterly contagious in Asian countries, it rapidly spread to other countries. This disease caused governments worldwide to declare a public health crisis with severe measures taken to reduce the speed of the sp...

Descripción completa

Detalles Bibliográficos
Autores principales:	Toliyat, Amir, Levitan, Sarah Ita, Peng, Zheng, Etemadpour, Ronak
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2022
Materias:	Artificial Intelligence
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9421075/ https://www.ncbi.nlm.nih.gov/pubmed/36046150 http://dx.doi.org/10.3389/frai.2022.932381

_version_	1784777514780983296
author	Toliyat, Amir Levitan, Sarah Ita Peng, Zheng Etemadpour, Ronak
author_facet	Toliyat, Amir Levitan, Sarah Ita Peng, Zheng Etemadpour, Ronak
author_sort	Toliyat, Amir
collection	PubMed
description	Coronavirus disease 2019 (COVID-19) started in Wuhan, China, in late 2019, and after being utterly contagious in Asian countries, it rapidly spread to other countries. This disease caused governments worldwide to declare a public health crisis with severe measures taken to reduce the speed of the spread of the disease. This pandemic affected the lives of millions of people. Many citizens that lost their loved ones and jobs experienced a wide range of emotions, such as disbelief, shock, concerns about health, fear about food supplies, anxiety, and panic. All of the aforementioned phenomena led to the spread of racism and hate against Asians in western countries, especially in the United States. An analysis of official preliminary police data by the Center for the Study of Hate & Extremism at California State University shows that Anti-Asian hate crime in 16 of America's largest cities increased by 149% in 2020. In this study, we first chose a baseline of Americans' hate crimes against Asians on Twitter. Then we present an approach to balance the biased dataset and consequently improve the performance of tweet classification. We also have downloaded 10 million tweets through the Twitter API V-2. In this study, we have used a small portion of that, and we will use the entire dataset in the future study. In this article, three thousand tweets from our collected corpus are annotated by four annotators, including three Asian and one Asian-American. Using this data, we built predictive models of hate speech using various machine learning and deep learning methods. Our machine learning methods include Random Forest, K-nearest neighbors (KNN), Support Vector Machine (SVM), Extreme Gradient Boosting (XGBoost), Logistic Regression, Decision Tree, and Naive Bayes. Our Deep Learning models include Basic Long-Term Short-Term Memory (LSTM), Bidirectional LSTM, Bidirectional LSTM with Drop out, Convolution, and Bidirectional Encoder Representations from Transformers (BERT). We also adjusted our dataset by filtering tweets that were ambiguous to the annotators based on low Fleiss Kappa agreement between annotators. Our final result showed that Logistic Regression achieved the best statistical machine learning performance with an F1 score of 0.72, while BERT achieved the best performance of the deep learning models, with an F1-Score of 0.85.
format	Online Article Text
id	pubmed-9421075
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-94210752022-08-30 Asian hate speech detection on Twitter during COVID-19 Toliyat, Amir Levitan, Sarah Ita Peng, Zheng Etemadpour, Ronak Front Artif Intell Artificial Intelligence Coronavirus disease 2019 (COVID-19) started in Wuhan, China, in late 2019, and after being utterly contagious in Asian countries, it rapidly spread to other countries. This disease caused governments worldwide to declare a public health crisis with severe measures taken to reduce the speed of the spread of the disease. This pandemic affected the lives of millions of people. Many citizens that lost their loved ones and jobs experienced a wide range of emotions, such as disbelief, shock, concerns about health, fear about food supplies, anxiety, and panic. All of the aforementioned phenomena led to the spread of racism and hate against Asians in western countries, especially in the United States. An analysis of official preliminary police data by the Center for the Study of Hate & Extremism at California State University shows that Anti-Asian hate crime in 16 of America's largest cities increased by 149% in 2020. In this study, we first chose a baseline of Americans' hate crimes against Asians on Twitter. Then we present an approach to balance the biased dataset and consequently improve the performance of tweet classification. We also have downloaded 10 million tweets through the Twitter API V-2. In this study, we have used a small portion of that, and we will use the entire dataset in the future study. In this article, three thousand tweets from our collected corpus are annotated by four annotators, including three Asian and one Asian-American. Using this data, we built predictive models of hate speech using various machine learning and deep learning methods. Our machine learning methods include Random Forest, K-nearest neighbors (KNN), Support Vector Machine (SVM), Extreme Gradient Boosting (XGBoost), Logistic Regression, Decision Tree, and Naive Bayes. Our Deep Learning models include Basic Long-Term Short-Term Memory (LSTM), Bidirectional LSTM, Bidirectional LSTM with Drop out, Convolution, and Bidirectional Encoder Representations from Transformers (BERT). We also adjusted our dataset by filtering tweets that were ambiguous to the annotators based on low Fleiss Kappa agreement between annotators. Our final result showed that Logistic Regression achieved the best statistical machine learning performance with an F1 score of 0.72, while BERT achieved the best performance of the deep learning models, with an F1-Score of 0.85. Frontiers Media S.A. 2022-08-15 /pmc/articles/PMC9421075/ /pubmed/36046150 http://dx.doi.org/10.3389/frai.2022.932381 Text en Copyright © 2022 Toliyat, Levitan, Peng and Etemadpour. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Artificial Intelligence Toliyat, Amir Levitan, Sarah Ita Peng, Zheng Etemadpour, Ronak Asian hate speech detection on Twitter during COVID-19
title	Asian hate speech detection on Twitter during COVID-19
title_full	Asian hate speech detection on Twitter during COVID-19
title_fullStr	Asian hate speech detection on Twitter during COVID-19
title_full_unstemmed	Asian hate speech detection on Twitter during COVID-19
title_short	Asian hate speech detection on Twitter during COVID-19
title_sort	asian hate speech detection on twitter during covid-19
topic	Artificial Intelligence
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9421075/ https://www.ncbi.nlm.nih.gov/pubmed/36046150 http://dx.doi.org/10.3389/frai.2022.932381
work_keys_str_mv	AT toliyatamir asianhatespeechdetectionontwitterduringcovid19 AT levitansarahita asianhatespeechdetectionontwitterduringcovid19 AT pengzheng asianhatespeechdetectionontwitterduringcovid19 AT etemadpourronak asianhatespeechdetectionontwitterduringcovid19

Asian hate speech detection on Twitter during COVID-19

Ejemplares similares