Cargando…

Major data analysis errors invalidate cancer microbiome findings

We re-analyzed the data from a recent large-scale study that reported strong correlations between DNA signatures of microbial organisms and 33 different cancer types and that created machine-learning predictors with near-perfect accuracy at distinguishing among cancers. We found at least two fundame...

Descripción completa

Detalles Bibliográficos
Autores principales: Gihawi, Abraham, Ge, Yuchen, Lu, Jennifer, Puiu, Daniela, Xu, Amanda, Cooper, Colin S., Brewer, Daniel S., Pertea, Mihaela, Salzberg, Steven L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Society for Microbiology 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10653788/
https://www.ncbi.nlm.nih.gov/pubmed/37811944
http://dx.doi.org/10.1128/mbio.01607-23
_version_ 1785136486637633536
author Gihawi, Abraham
Ge, Yuchen
Lu, Jennifer
Puiu, Daniela
Xu, Amanda
Cooper, Colin S.
Brewer, Daniel S.
Pertea, Mihaela
Salzberg, Steven L.
author_facet Gihawi, Abraham
Ge, Yuchen
Lu, Jennifer
Puiu, Daniela
Xu, Amanda
Cooper, Colin S.
Brewer, Daniel S.
Pertea, Mihaela
Salzberg, Steven L.
author_sort Gihawi, Abraham
collection PubMed
description We re-analyzed the data from a recent large-scale study that reported strong correlations between DNA signatures of microbial organisms and 33 different cancer types and that created machine-learning predictors with near-perfect accuracy at distinguishing among cancers. We found at least two fundamental flaws in the reported data and in the methods: (i) errors in the genome database and the associated computational methods led to millions of false-positive findings of bacterial reads across all samples, largely because most of the sequences identified as bacteria were instead human; and (ii) errors in the transformation of the raw data created an artificial signature, even for microbes with no reads detected, tagging each tumor type with a distinct signal that the machine-learning programs then used to create an apparently accurate classifier. Each of these problems invalidates the results, leading to the conclusion that the microbiome-based classifiers for identifying cancer presented in the study are entirely wrong. These flaws have subsequently affected more than a dozen additional published studies that used the same data and whose results are likely invalid as well. IMPORTANCE: Recent reports showing that human cancers have a distinctive microbiome have led to a flurry of papers describing microbial signatures of different cancer types. Many of these reports are based on flawed data that, upon re-analysis, completely overturns the original findings. The re-analysis conducted here shows that most of the microbes originally reported as associated with cancer were not present at all in the samples. The original report of a cancer microbiome and more than a dozen follow-up studies are, therefore, likely to be invalid.
format Online
Article
Text
id pubmed-10653788
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher American Society for Microbiology
record_format MEDLINE/PubMed
spelling pubmed-106537882023-10-09 Major data analysis errors invalidate cancer microbiome findings Gihawi, Abraham Ge, Yuchen Lu, Jennifer Puiu, Daniela Xu, Amanda Cooper, Colin S. Brewer, Daniel S. Pertea, Mihaela Salzberg, Steven L. mBio Research Article We re-analyzed the data from a recent large-scale study that reported strong correlations between DNA signatures of microbial organisms and 33 different cancer types and that created machine-learning predictors with near-perfect accuracy at distinguishing among cancers. We found at least two fundamental flaws in the reported data and in the methods: (i) errors in the genome database and the associated computational methods led to millions of false-positive findings of bacterial reads across all samples, largely because most of the sequences identified as bacteria were instead human; and (ii) errors in the transformation of the raw data created an artificial signature, even for microbes with no reads detected, tagging each tumor type with a distinct signal that the machine-learning programs then used to create an apparently accurate classifier. Each of these problems invalidates the results, leading to the conclusion that the microbiome-based classifiers for identifying cancer presented in the study are entirely wrong. These flaws have subsequently affected more than a dozen additional published studies that used the same data and whose results are likely invalid as well. IMPORTANCE: Recent reports showing that human cancers have a distinctive microbiome have led to a flurry of papers describing microbial signatures of different cancer types. Many of these reports are based on flawed data that, upon re-analysis, completely overturns the original findings. The re-analysis conducted here shows that most of the microbes originally reported as associated with cancer were not present at all in the samples. The original report of a cancer microbiome and more than a dozen follow-up studies are, therefore, likely to be invalid. American Society for Microbiology 2023-10-09 /pmc/articles/PMC10653788/ /pubmed/37811944 http://dx.doi.org/10.1128/mbio.01607-23 Text en Copyright © 2023 Gihawi et al. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International license (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Research Article
Gihawi, Abraham
Ge, Yuchen
Lu, Jennifer
Puiu, Daniela
Xu, Amanda
Cooper, Colin S.
Brewer, Daniel S.
Pertea, Mihaela
Salzberg, Steven L.
Major data analysis errors invalidate cancer microbiome findings
title Major data analysis errors invalidate cancer microbiome findings
title_full Major data analysis errors invalidate cancer microbiome findings
title_fullStr Major data analysis errors invalidate cancer microbiome findings
title_full_unstemmed Major data analysis errors invalidate cancer microbiome findings
title_short Major data analysis errors invalidate cancer microbiome findings
title_sort major data analysis errors invalidate cancer microbiome findings
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10653788/
https://www.ncbi.nlm.nih.gov/pubmed/37811944
http://dx.doi.org/10.1128/mbio.01607-23
work_keys_str_mv AT gihawiabraham majordataanalysiserrorsinvalidatecancermicrobiomefindings
AT geyuchen majordataanalysiserrorsinvalidatecancermicrobiomefindings
AT lujennifer majordataanalysiserrorsinvalidatecancermicrobiomefindings
AT puiudaniela majordataanalysiserrorsinvalidatecancermicrobiomefindings
AT xuamanda majordataanalysiserrorsinvalidatecancermicrobiomefindings
AT coopercolins majordataanalysiserrorsinvalidatecancermicrobiomefindings
AT brewerdaniels majordataanalysiserrorsinvalidatecancermicrobiomefindings
AT perteamihaela majordataanalysiserrorsinvalidatecancermicrobiomefindings
AT salzbergstevenl majordataanalysiserrorsinvalidatecancermicrobiomefindings