Cargando…

Combating the Infodemic: A Chinese Infodemic Dataset for Misinformation Identification

Misinformation posted on social media during COVID-19 is one main example of infodemic data. This phenomenon was prominent in China when COVID-19 happened at the beginning. While a lot of data can be collected from various social media platforms, publicly available infodemic detection data remains r...

Descripción completa

Detalles Bibliográficos
Autores principales: Luo, Jia, Xue, Rui, Hu, Jinglu, El Baz, Didier
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8469168/
https://www.ncbi.nlm.nih.gov/pubmed/34574868
http://dx.doi.org/10.3390/healthcare9091094
_version_ 1784573861369479168
author Luo, Jia
Xue, Rui
Hu, Jinglu
El Baz, Didier
author_facet Luo, Jia
Xue, Rui
Hu, Jinglu
El Baz, Didier
author_sort Luo, Jia
collection PubMed
description Misinformation posted on social media during COVID-19 is one main example of infodemic data. This phenomenon was prominent in China when COVID-19 happened at the beginning. While a lot of data can be collected from various social media platforms, publicly available infodemic detection data remains rare and is not easy to construct manually. Therefore, instead of developing techniques for infodemic detection, this paper aims at constructing a Chinese infodemic dataset, “infodemic 2019”, by collecting widely spread Chinese infodemic during the COVID-19 outbreak. Each record is labeled as true, false or questionable. After a four-time adjustment, the original imbalanced dataset is converted into a balanced dataset by exploring the properties of the collected records. The final labels achieve high intercoder reliability with healthcare workers’ annotations and the high-frequency words show a strong relationship between the proposed dataset and pandemic diseases. Finally, numerical experiments are carried out with RNN, CNN and fastText. All of them achieve reasonable performance and present baselines for future works.
format Online
Article
Text
id pubmed-8469168
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-84691682021-09-27 Combating the Infodemic: A Chinese Infodemic Dataset for Misinformation Identification Luo, Jia Xue, Rui Hu, Jinglu El Baz, Didier Healthcare (Basel) Article Misinformation posted on social media during COVID-19 is one main example of infodemic data. This phenomenon was prominent in China when COVID-19 happened at the beginning. While a lot of data can be collected from various social media platforms, publicly available infodemic detection data remains rare and is not easy to construct manually. Therefore, instead of developing techniques for infodemic detection, this paper aims at constructing a Chinese infodemic dataset, “infodemic 2019”, by collecting widely spread Chinese infodemic during the COVID-19 outbreak. Each record is labeled as true, false or questionable. After a four-time adjustment, the original imbalanced dataset is converted into a balanced dataset by exploring the properties of the collected records. The final labels achieve high intercoder reliability with healthcare workers’ annotations and the high-frequency words show a strong relationship between the proposed dataset and pandemic diseases. Finally, numerical experiments are carried out with RNN, CNN and fastText. All of them achieve reasonable performance and present baselines for future works. MDPI 2021-08-24 /pmc/articles/PMC8469168/ /pubmed/34574868 http://dx.doi.org/10.3390/healthcare9091094 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Luo, Jia
Xue, Rui
Hu, Jinglu
El Baz, Didier
Combating the Infodemic: A Chinese Infodemic Dataset for Misinformation Identification
title Combating the Infodemic: A Chinese Infodemic Dataset for Misinformation Identification
title_full Combating the Infodemic: A Chinese Infodemic Dataset for Misinformation Identification
title_fullStr Combating the Infodemic: A Chinese Infodemic Dataset for Misinformation Identification
title_full_unstemmed Combating the Infodemic: A Chinese Infodemic Dataset for Misinformation Identification
title_short Combating the Infodemic: A Chinese Infodemic Dataset for Misinformation Identification
title_sort combating the infodemic: a chinese infodemic dataset for misinformation identification
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8469168/
https://www.ncbi.nlm.nih.gov/pubmed/34574868
http://dx.doi.org/10.3390/healthcare9091094
work_keys_str_mv AT luojia combatingtheinfodemicachineseinfodemicdatasetformisinformationidentification
AT xuerui combatingtheinfodemicachineseinfodemicdatasetformisinformationidentification
AT hujinglu combatingtheinfodemicachineseinfodemicdatasetformisinformationidentification
AT elbazdidier combatingtheinfodemicachineseinfodemicdatasetformisinformationidentification