Cargando…

The reproducibility of COVID-19 data analysis: paradoxes, pitfalls, and future challenges

In the midst of the COVID-19 experience, we learned an important scientific lesson: knowledge acquisition and information quality in medicine depends more on “data quality” rather than “data quantity.” The large number of COVID-19 reports, published in a very short time, demonstrated that the most a...

Descripción completa

Detalles Bibliográficos
Autores principales: Serio, Clelia Di, Malgaroli, Antonio, Ferrari, Paolo, Kenett, Ron S
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9896906/
https://www.ncbi.nlm.nih.gov/pubmed/36741433
http://dx.doi.org/10.1093/pnasnexus/pgac125
_version_ 1784882144770785280
author Serio, Clelia Di
Malgaroli, Antonio
Ferrari, Paolo
Kenett, Ron S
author_facet Serio, Clelia Di
Malgaroli, Antonio
Ferrari, Paolo
Kenett, Ron S
author_sort Serio, Clelia Di
collection PubMed
description In the midst of the COVID-19 experience, we learned an important scientific lesson: knowledge acquisition and information quality in medicine depends more on “data quality” rather than “data quantity.” The large number of COVID-19 reports, published in a very short time, demonstrated that the most advanced statistical and computational tools cannot properly overcome the poor quality of acquired data. The main evidence for this observation comes from the poor reproducibility of results. Indeed, understanding the data generation process is fundamental when investigating scientific questions such as prevalence, immunity, transmissibility, and susceptibility. Most of COVID-19 studies are case reports based on “non probability” sampling and do not adhere to the general principles of controlled experimental designs. Such collected data suffers from many limitations when used to derive clinical conclusions. These include confounding factors, measurement errors and bias selection effects. Each of these elements represents a source of uncertainty, which is often ignored or assumed to provide an unbiased random contribution. Inference retrieved from large data in medicine is also affected by data protection policies that, while protecting patients’ privacy, are likely to reduce consistently usefulness of big data in achieving fundamental goals such as effective and efficient data-integration. This limits the degree of generalizability of scientific studies and leads to paradoxical and conflicting conclusions. We provide such examples from assessing the role of risks factors. In conclusion, new paradigms and new designs schemes are needed in order to reach inferential conclusions that are meaningful and informative when dealing with data collected during emergencies like COVID-19.
format Online
Article
Text
id pubmed-9896906
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-98969062023-02-04 The reproducibility of COVID-19 data analysis: paradoxes, pitfalls, and future challenges Serio, Clelia Di Malgaroli, Antonio Ferrari, Paolo Kenett, Ron S PNAS Nexus Perspectives In the midst of the COVID-19 experience, we learned an important scientific lesson: knowledge acquisition and information quality in medicine depends more on “data quality” rather than “data quantity.” The large number of COVID-19 reports, published in a very short time, demonstrated that the most advanced statistical and computational tools cannot properly overcome the poor quality of acquired data. The main evidence for this observation comes from the poor reproducibility of results. Indeed, understanding the data generation process is fundamental when investigating scientific questions such as prevalence, immunity, transmissibility, and susceptibility. Most of COVID-19 studies are case reports based on “non probability” sampling and do not adhere to the general principles of controlled experimental designs. Such collected data suffers from many limitations when used to derive clinical conclusions. These include confounding factors, measurement errors and bias selection effects. Each of these elements represents a source of uncertainty, which is often ignored or assumed to provide an unbiased random contribution. Inference retrieved from large data in medicine is also affected by data protection policies that, while protecting patients’ privacy, are likely to reduce consistently usefulness of big data in achieving fundamental goals such as effective and efficient data-integration. This limits the degree of generalizability of scientific studies and leads to paradoxical and conflicting conclusions. We provide such examples from assessing the role of risks factors. In conclusion, new paradigms and new designs schemes are needed in order to reach inferential conclusions that are meaningful and informative when dealing with data collected during emergencies like COVID-19. Oxford University Press 2022-08-23 /pmc/articles/PMC9896906/ /pubmed/36741433 http://dx.doi.org/10.1093/pnasnexus/pgac125 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of the National Academy of Sciences. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Perspectives
Serio, Clelia Di
Malgaroli, Antonio
Ferrari, Paolo
Kenett, Ron S
The reproducibility of COVID-19 data analysis: paradoxes, pitfalls, and future challenges
title The reproducibility of COVID-19 data analysis: paradoxes, pitfalls, and future challenges
title_full The reproducibility of COVID-19 data analysis: paradoxes, pitfalls, and future challenges
title_fullStr The reproducibility of COVID-19 data analysis: paradoxes, pitfalls, and future challenges
title_full_unstemmed The reproducibility of COVID-19 data analysis: paradoxes, pitfalls, and future challenges
title_short The reproducibility of COVID-19 data analysis: paradoxes, pitfalls, and future challenges
title_sort reproducibility of covid-19 data analysis: paradoxes, pitfalls, and future challenges
topic Perspectives
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9896906/
https://www.ncbi.nlm.nih.gov/pubmed/36741433
http://dx.doi.org/10.1093/pnasnexus/pgac125
work_keys_str_mv AT seriocleliadi thereproducibilityofcovid19dataanalysisparadoxespitfallsandfuturechallenges
AT malgaroliantonio thereproducibilityofcovid19dataanalysisparadoxespitfallsandfuturechallenges
AT ferraripaolo thereproducibilityofcovid19dataanalysisparadoxespitfallsandfuturechallenges
AT kenettrons thereproducibilityofcovid19dataanalysisparadoxespitfallsandfuturechallenges
AT seriocleliadi reproducibilityofcovid19dataanalysisparadoxespitfallsandfuturechallenges
AT malgaroliantonio reproducibilityofcovid19dataanalysisparadoxespitfallsandfuturechallenges
AT ferraripaolo reproducibilityofcovid19dataanalysisparadoxespitfallsandfuturechallenges
AT kenettrons reproducibilityofcovid19dataanalysisparadoxespitfallsandfuturechallenges