Cargando…
Combating the Infodemic: A Chinese Infodemic Dataset for Misinformation Identification
Misinformation posted on social media during COVID-19 is one main example of infodemic data. This phenomenon was prominent in China when COVID-19 happened at the beginning. While a lot of data can be collected from various social media platforms, publicly available infodemic detection data remains r...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8469168/ https://www.ncbi.nlm.nih.gov/pubmed/34574868 http://dx.doi.org/10.3390/healthcare9091094 |
_version_ | 1784573861369479168 |
---|---|
author | Luo, Jia Xue, Rui Hu, Jinglu El Baz, Didier |
author_facet | Luo, Jia Xue, Rui Hu, Jinglu El Baz, Didier |
author_sort | Luo, Jia |
collection | PubMed |
description | Misinformation posted on social media during COVID-19 is one main example of infodemic data. This phenomenon was prominent in China when COVID-19 happened at the beginning. While a lot of data can be collected from various social media platforms, publicly available infodemic detection data remains rare and is not easy to construct manually. Therefore, instead of developing techniques for infodemic detection, this paper aims at constructing a Chinese infodemic dataset, “infodemic 2019”, by collecting widely spread Chinese infodemic during the COVID-19 outbreak. Each record is labeled as true, false or questionable. After a four-time adjustment, the original imbalanced dataset is converted into a balanced dataset by exploring the properties of the collected records. The final labels achieve high intercoder reliability with healthcare workers’ annotations and the high-frequency words show a strong relationship between the proposed dataset and pandemic diseases. Finally, numerical experiments are carried out with RNN, CNN and fastText. All of them achieve reasonable performance and present baselines for future works. |
format | Online Article Text |
id | pubmed-8469168 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-84691682021-09-27 Combating the Infodemic: A Chinese Infodemic Dataset for Misinformation Identification Luo, Jia Xue, Rui Hu, Jinglu El Baz, Didier Healthcare (Basel) Article Misinformation posted on social media during COVID-19 is one main example of infodemic data. This phenomenon was prominent in China when COVID-19 happened at the beginning. While a lot of data can be collected from various social media platforms, publicly available infodemic detection data remains rare and is not easy to construct manually. Therefore, instead of developing techniques for infodemic detection, this paper aims at constructing a Chinese infodemic dataset, “infodemic 2019”, by collecting widely spread Chinese infodemic during the COVID-19 outbreak. Each record is labeled as true, false or questionable. After a four-time adjustment, the original imbalanced dataset is converted into a balanced dataset by exploring the properties of the collected records. The final labels achieve high intercoder reliability with healthcare workers’ annotations and the high-frequency words show a strong relationship between the proposed dataset and pandemic diseases. Finally, numerical experiments are carried out with RNN, CNN and fastText. All of them achieve reasonable performance and present baselines for future works. MDPI 2021-08-24 /pmc/articles/PMC8469168/ /pubmed/34574868 http://dx.doi.org/10.3390/healthcare9091094 Text en © 2021 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Luo, Jia Xue, Rui Hu, Jinglu El Baz, Didier Combating the Infodemic: A Chinese Infodemic Dataset for Misinformation Identification |
title | Combating the Infodemic: A Chinese Infodemic Dataset for Misinformation Identification |
title_full | Combating the Infodemic: A Chinese Infodemic Dataset for Misinformation Identification |
title_fullStr | Combating the Infodemic: A Chinese Infodemic Dataset for Misinformation Identification |
title_full_unstemmed | Combating the Infodemic: A Chinese Infodemic Dataset for Misinformation Identification |
title_short | Combating the Infodemic: A Chinese Infodemic Dataset for Misinformation Identification |
title_sort | combating the infodemic: a chinese infodemic dataset for misinformation identification |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8469168/ https://www.ncbi.nlm.nih.gov/pubmed/34574868 http://dx.doi.org/10.3390/healthcare9091094 |
work_keys_str_mv | AT luojia combatingtheinfodemicachineseinfodemicdatasetformisinformationidentification AT xuerui combatingtheinfodemicachineseinfodemicdatasetformisinformationidentification AT hujinglu combatingtheinfodemicachineseinfodemicdatasetformisinformationidentification AT elbazdidier combatingtheinfodemicachineseinfodemicdatasetformisinformationidentification |