Cargando…
Major data analysis errors invalidate cancer microbiome findings
We re-analyzed the data from a recent large-scale study that reported strong correlations between microbial organisms and 33 different cancer types, and that created machine learning predictors with near-perfect accuracy at distinguishing among cancers. We found at least two fundamental flaws in the...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cold Spring Harbor Laboratory
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10418105/ https://www.ncbi.nlm.nih.gov/pubmed/37577699 http://dx.doi.org/10.1101/2023.07.28.550993 |
_version_ | 1785088195791159296 |
---|---|
author | Gihawi, Abraham Ge, Yuchen Lu, Jennifer Puiu, Daniela Xu, Amanda Cooper, Colin S. Brewer, Daniel S. Pertea, Mihaela Salzberg, Steven L. |
author_facet | Gihawi, Abraham Ge, Yuchen Lu, Jennifer Puiu, Daniela Xu, Amanda Cooper, Colin S. Brewer, Daniel S. Pertea, Mihaela Salzberg, Steven L. |
author_sort | Gihawi, Abraham |
collection | PubMed |
description | We re-analyzed the data from a recent large-scale study that reported strong correlations between microbial organisms and 33 different cancer types, and that created machine learning predictors with near-perfect accuracy at distinguishing among cancers. We found at least two fundamental flaws in the reported data and in the methods: (1) errors in the genome database and the associated computational methods led to millions of false positive findings of bacterial reads across all samples, largely because most of the sequences identified as bacteria were instead human; and (2) errors in transformation of the raw data created an artificial signature, even for microbes with no reads detected, tagging each tumor type with a distinct signal that the machine learning programs then used to create an apparently accurate classifier. Each of these problems invalidates the results, leading to the conclusion that the microbiome-based classifiers for identifying cancer presented in the study are entirely wrong. These flaws have subsequently affected more than a dozen additional published studies that used the same data and whose results are likely invalid as well. |
format | Online Article Text |
id | pubmed-10418105 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Cold Spring Harbor Laboratory |
record_format | MEDLINE/PubMed |
spelling | pubmed-104181052023-08-12 Major data analysis errors invalidate cancer microbiome findings Gihawi, Abraham Ge, Yuchen Lu, Jennifer Puiu, Daniela Xu, Amanda Cooper, Colin S. Brewer, Daniel S. Pertea, Mihaela Salzberg, Steven L. bioRxiv Article We re-analyzed the data from a recent large-scale study that reported strong correlations between microbial organisms and 33 different cancer types, and that created machine learning predictors with near-perfect accuracy at distinguishing among cancers. We found at least two fundamental flaws in the reported data and in the methods: (1) errors in the genome database and the associated computational methods led to millions of false positive findings of bacterial reads across all samples, largely because most of the sequences identified as bacteria were instead human; and (2) errors in transformation of the raw data created an artificial signature, even for microbes with no reads detected, tagging each tumor type with a distinct signal that the machine learning programs then used to create an apparently accurate classifier. Each of these problems invalidates the results, leading to the conclusion that the microbiome-based classifiers for identifying cancer presented in the study are entirely wrong. These flaws have subsequently affected more than a dozen additional published studies that used the same data and whose results are likely invalid as well. Cold Spring Harbor Laboratory 2023-07-31 /pmc/articles/PMC10418105/ /pubmed/37577699 http://dx.doi.org/10.1101/2023.07.28.550993 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use. |
spellingShingle | Article Gihawi, Abraham Ge, Yuchen Lu, Jennifer Puiu, Daniela Xu, Amanda Cooper, Colin S. Brewer, Daniel S. Pertea, Mihaela Salzberg, Steven L. Major data analysis errors invalidate cancer microbiome findings |
title | Major data analysis errors invalidate cancer microbiome findings |
title_full | Major data analysis errors invalidate cancer microbiome findings |
title_fullStr | Major data analysis errors invalidate cancer microbiome findings |
title_full_unstemmed | Major data analysis errors invalidate cancer microbiome findings |
title_short | Major data analysis errors invalidate cancer microbiome findings |
title_sort | major data analysis errors invalidate cancer microbiome findings |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10418105/ https://www.ncbi.nlm.nih.gov/pubmed/37577699 http://dx.doi.org/10.1101/2023.07.28.550993 |
work_keys_str_mv | AT gihawiabraham majordataanalysiserrorsinvalidatecancermicrobiomefindings AT geyuchen majordataanalysiserrorsinvalidatecancermicrobiomefindings AT lujennifer majordataanalysiserrorsinvalidatecancermicrobiomefindings AT puiudaniela majordataanalysiserrorsinvalidatecancermicrobiomefindings AT xuamanda majordataanalysiserrorsinvalidatecancermicrobiomefindings AT coopercolins majordataanalysiserrorsinvalidatecancermicrobiomefindings AT brewerdaniels majordataanalysiserrorsinvalidatecancermicrobiomefindings AT perteamihaela majordataanalysiserrorsinvalidatecancermicrobiomefindings AT salzbergstevenl majordataanalysiserrorsinvalidatecancermicrobiomefindings |