Cargando…

Major data analysis errors invalidate cancer microbiome findings

We re-analyzed the data from a recent large-scale study that reported strong correlations between microbial organisms and 33 different cancer types, and that created machine learning predictors with near-perfect accuracy at distinguishing among cancers. We found at least two fundamental flaws in the...

Descripción completa

Detalles Bibliográficos
Autores principales: Gihawi, Abraham, Ge, Yuchen, Lu, Jennifer, Puiu, Daniela, Xu, Amanda, Cooper, Colin S., Brewer, Daniel S., Pertea, Mihaela, Salzberg, Steven L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10418105/
https://www.ncbi.nlm.nih.gov/pubmed/37577699
http://dx.doi.org/10.1101/2023.07.28.550993
_version_ 1785088195791159296
author Gihawi, Abraham
Ge, Yuchen
Lu, Jennifer
Puiu, Daniela
Xu, Amanda
Cooper, Colin S.
Brewer, Daniel S.
Pertea, Mihaela
Salzberg, Steven L.
author_facet Gihawi, Abraham
Ge, Yuchen
Lu, Jennifer
Puiu, Daniela
Xu, Amanda
Cooper, Colin S.
Brewer, Daniel S.
Pertea, Mihaela
Salzberg, Steven L.
author_sort Gihawi, Abraham
collection PubMed
description We re-analyzed the data from a recent large-scale study that reported strong correlations between microbial organisms and 33 different cancer types, and that created machine learning predictors with near-perfect accuracy at distinguishing among cancers. We found at least two fundamental flaws in the reported data and in the methods: (1) errors in the genome database and the associated computational methods led to millions of false positive findings of bacterial reads across all samples, largely because most of the sequences identified as bacteria were instead human; and (2) errors in transformation of the raw data created an artificial signature, even for microbes with no reads detected, tagging each tumor type with a distinct signal that the machine learning programs then used to create an apparently accurate classifier. Each of these problems invalidates the results, leading to the conclusion that the microbiome-based classifiers for identifying cancer presented in the study are entirely wrong. These flaws have subsequently affected more than a dozen additional published studies that used the same data and whose results are likely invalid as well.
format Online
Article
Text
id pubmed-10418105
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-104181052023-08-12 Major data analysis errors invalidate cancer microbiome findings Gihawi, Abraham Ge, Yuchen Lu, Jennifer Puiu, Daniela Xu, Amanda Cooper, Colin S. Brewer, Daniel S. Pertea, Mihaela Salzberg, Steven L. bioRxiv Article We re-analyzed the data from a recent large-scale study that reported strong correlations between microbial organisms and 33 different cancer types, and that created machine learning predictors with near-perfect accuracy at distinguishing among cancers. We found at least two fundamental flaws in the reported data and in the methods: (1) errors in the genome database and the associated computational methods led to millions of false positive findings of bacterial reads across all samples, largely because most of the sequences identified as bacteria were instead human; and (2) errors in transformation of the raw data created an artificial signature, even for microbes with no reads detected, tagging each tumor type with a distinct signal that the machine learning programs then used to create an apparently accurate classifier. Each of these problems invalidates the results, leading to the conclusion that the microbiome-based classifiers for identifying cancer presented in the study are entirely wrong. These flaws have subsequently affected more than a dozen additional published studies that used the same data and whose results are likely invalid as well. Cold Spring Harbor Laboratory 2023-07-31 /pmc/articles/PMC10418105/ /pubmed/37577699 http://dx.doi.org/10.1101/2023.07.28.550993 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.
spellingShingle Article
Gihawi, Abraham
Ge, Yuchen
Lu, Jennifer
Puiu, Daniela
Xu, Amanda
Cooper, Colin S.
Brewer, Daniel S.
Pertea, Mihaela
Salzberg, Steven L.
Major data analysis errors invalidate cancer microbiome findings
title Major data analysis errors invalidate cancer microbiome findings
title_full Major data analysis errors invalidate cancer microbiome findings
title_fullStr Major data analysis errors invalidate cancer microbiome findings
title_full_unstemmed Major data analysis errors invalidate cancer microbiome findings
title_short Major data analysis errors invalidate cancer microbiome findings
title_sort major data analysis errors invalidate cancer microbiome findings
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10418105/
https://www.ncbi.nlm.nih.gov/pubmed/37577699
http://dx.doi.org/10.1101/2023.07.28.550993
work_keys_str_mv AT gihawiabraham majordataanalysiserrorsinvalidatecancermicrobiomefindings
AT geyuchen majordataanalysiserrorsinvalidatecancermicrobiomefindings
AT lujennifer majordataanalysiserrorsinvalidatecancermicrobiomefindings
AT puiudaniela majordataanalysiserrorsinvalidatecancermicrobiomefindings
AT xuamanda majordataanalysiserrorsinvalidatecancermicrobiomefindings
AT coopercolins majordataanalysiserrorsinvalidatecancermicrobiomefindings
AT brewerdaniels majordataanalysiserrorsinvalidatecancermicrobiomefindings
AT perteamihaela majordataanalysiserrorsinvalidatecancermicrobiomefindings
AT salzbergstevenl majordataanalysiserrorsinvalidatecancermicrobiomefindings