Cargando…
The reproducibility of COVID-19 data analysis: paradoxes, pitfalls, and future challenges
In the midst of the COVID-19 experience, we learned an important scientific lesson: knowledge acquisition and information quality in medicine depends more on “data quality” rather than “data quantity.” The large number of COVID-19 reports, published in a very short time, demonstrated that the most a...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9896906/ https://www.ncbi.nlm.nih.gov/pubmed/36741433 http://dx.doi.org/10.1093/pnasnexus/pgac125 |
_version_ | 1784882144770785280 |
---|---|
author | Serio, Clelia Di Malgaroli, Antonio Ferrari, Paolo Kenett, Ron S |
author_facet | Serio, Clelia Di Malgaroli, Antonio Ferrari, Paolo Kenett, Ron S |
author_sort | Serio, Clelia Di |
collection | PubMed |
description | In the midst of the COVID-19 experience, we learned an important scientific lesson: knowledge acquisition and information quality in medicine depends more on “data quality” rather than “data quantity.” The large number of COVID-19 reports, published in a very short time, demonstrated that the most advanced statistical and computational tools cannot properly overcome the poor quality of acquired data. The main evidence for this observation comes from the poor reproducibility of results. Indeed, understanding the data generation process is fundamental when investigating scientific questions such as prevalence, immunity, transmissibility, and susceptibility. Most of COVID-19 studies are case reports based on “non probability” sampling and do not adhere to the general principles of controlled experimental designs. Such collected data suffers from many limitations when used to derive clinical conclusions. These include confounding factors, measurement errors and bias selection effects. Each of these elements represents a source of uncertainty, which is often ignored or assumed to provide an unbiased random contribution. Inference retrieved from large data in medicine is also affected by data protection policies that, while protecting patients’ privacy, are likely to reduce consistently usefulness of big data in achieving fundamental goals such as effective and efficient data-integration. This limits the degree of generalizability of scientific studies and leads to paradoxical and conflicting conclusions. We provide such examples from assessing the role of risks factors. In conclusion, new paradigms and new designs schemes are needed in order to reach inferential conclusions that are meaningful and informative when dealing with data collected during emergencies like COVID-19. |
format | Online Article Text |
id | pubmed-9896906 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-98969062023-02-04 The reproducibility of COVID-19 data analysis: paradoxes, pitfalls, and future challenges Serio, Clelia Di Malgaroli, Antonio Ferrari, Paolo Kenett, Ron S PNAS Nexus Perspectives In the midst of the COVID-19 experience, we learned an important scientific lesson: knowledge acquisition and information quality in medicine depends more on “data quality” rather than “data quantity.” The large number of COVID-19 reports, published in a very short time, demonstrated that the most advanced statistical and computational tools cannot properly overcome the poor quality of acquired data. The main evidence for this observation comes from the poor reproducibility of results. Indeed, understanding the data generation process is fundamental when investigating scientific questions such as prevalence, immunity, transmissibility, and susceptibility. Most of COVID-19 studies are case reports based on “non probability” sampling and do not adhere to the general principles of controlled experimental designs. Such collected data suffers from many limitations when used to derive clinical conclusions. These include confounding factors, measurement errors and bias selection effects. Each of these elements represents a source of uncertainty, which is often ignored or assumed to provide an unbiased random contribution. Inference retrieved from large data in medicine is also affected by data protection policies that, while protecting patients’ privacy, are likely to reduce consistently usefulness of big data in achieving fundamental goals such as effective and efficient data-integration. This limits the degree of generalizability of scientific studies and leads to paradoxical and conflicting conclusions. We provide such examples from assessing the role of risks factors. In conclusion, new paradigms and new designs schemes are needed in order to reach inferential conclusions that are meaningful and informative when dealing with data collected during emergencies like COVID-19. Oxford University Press 2022-08-23 /pmc/articles/PMC9896906/ /pubmed/36741433 http://dx.doi.org/10.1093/pnasnexus/pgac125 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of the National Academy of Sciences. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Perspectives Serio, Clelia Di Malgaroli, Antonio Ferrari, Paolo Kenett, Ron S The reproducibility of COVID-19 data analysis: paradoxes, pitfalls, and future challenges |
title | The reproducibility of COVID-19 data analysis: paradoxes, pitfalls, and future challenges |
title_full | The reproducibility of COVID-19 data analysis: paradoxes, pitfalls, and future challenges |
title_fullStr | The reproducibility of COVID-19 data analysis: paradoxes, pitfalls, and future challenges |
title_full_unstemmed | The reproducibility of COVID-19 data analysis: paradoxes, pitfalls, and future challenges |
title_short | The reproducibility of COVID-19 data analysis: paradoxes, pitfalls, and future challenges |
title_sort | reproducibility of covid-19 data analysis: paradoxes, pitfalls, and future challenges |
topic | Perspectives |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9896906/ https://www.ncbi.nlm.nih.gov/pubmed/36741433 http://dx.doi.org/10.1093/pnasnexus/pgac125 |
work_keys_str_mv | AT seriocleliadi thereproducibilityofcovid19dataanalysisparadoxespitfallsandfuturechallenges AT malgaroliantonio thereproducibilityofcovid19dataanalysisparadoxespitfallsandfuturechallenges AT ferraripaolo thereproducibilityofcovid19dataanalysisparadoxespitfallsandfuturechallenges AT kenettrons thereproducibilityofcovid19dataanalysisparadoxespitfallsandfuturechallenges AT seriocleliadi reproducibilityofcovid19dataanalysisparadoxespitfallsandfuturechallenges AT malgaroliantonio reproducibilityofcovid19dataanalysisparadoxespitfallsandfuturechallenges AT ferraripaolo reproducibilityofcovid19dataanalysisparadoxespitfallsandfuturechallenges AT kenettrons reproducibilityofcovid19dataanalysisparadoxespitfallsandfuturechallenges |