Cargando…

Distinguishing potential bacteria-tumor associations from contamination in a secondary data analysis of public cancer genome sequence data

BACKGROUND: A variety of bacteria are known to influence carcinogenesis. Therefore, we sought to investigate if publicly available whole genome and whole transcriptome sequencing data generated by large public cancer genome efforts, like The Cancer Genome Atlas (TCGA), could be used to identify bact...

Descripción completa

Detalles Bibliográficos
Autores principales:	Robinson, Kelly M., Crabtree, Jonathan, Mattick, John S. A., Anderson, Kathleen E., Dunning Hotopp, Julie C.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2017
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5264480/ https://www.ncbi.nlm.nih.gov/pubmed/28118849 http://dx.doi.org/10.1186/s40168-016-0224-8

_version_	1782500113613062144
author	Robinson, Kelly M. Crabtree, Jonathan Mattick, John S. A. Anderson, Kathleen E. Dunning Hotopp, Julie C.
author_facet	Robinson, Kelly M. Crabtree, Jonathan Mattick, John S. A. Anderson, Kathleen E. Dunning Hotopp, Julie C.
author_sort	Robinson, Kelly M.
collection	PubMed
description	BACKGROUND: A variety of bacteria are known to influence carcinogenesis. Therefore, we sought to investigate if publicly available whole genome and whole transcriptome sequencing data generated by large public cancer genome efforts, like The Cancer Genome Atlas (TCGA), could be used to identify bacteria associated with cancer. The Burrows-Wheeler aligner (BWA) was used to align a subset of Illumina paired-end sequencing data from TCGA to the human reference genome and all complete bacterial genomes in the RefSeq database in an effort to identify bacterial read pairs from the microbiome. RESULTS: Through careful consideration of all of the bacterial taxa present in the cancer types investigated, their relative abundance, and batch effects, we were able to identify some read pairs from certain taxa as likely resulting from contamination. In particular, the presence of Mycobacterium tuberculosis complex in the ovarian serous cystadenocarcinoma (OV) and glioblastoma multiforme (GBM) samples was correlated with the sequencing center of the samples. Additionally, there was a correlation between the presence of Ralstonia spp. and two specific plates of acute myeloid leukemia (AML) samples. At the end, associations remained between Pseudomonas-like and Acinetobacter-like read pairs in AML, and Pseudomonas-like read pairs in stomach adenocarcinoma (STAD) that could not be explained through batch effects or systematic contamination as seen in other samples. CONCLUSIONS: This approach suggests that it is possible to identify bacteria that may be present in human tumor samples from public genome sequencing data that can be examined further experimentally. More weight should be given to this approach in the future when bacterial associations with diseases are suspected. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s40168-016-0224-8) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-5264480
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-52644802017-01-30 Distinguishing potential bacteria-tumor associations from contamination in a secondary data analysis of public cancer genome sequence data Robinson, Kelly M. Crabtree, Jonathan Mattick, John S. A. Anderson, Kathleen E. Dunning Hotopp, Julie C. Microbiome Research BACKGROUND: A variety of bacteria are known to influence carcinogenesis. Therefore, we sought to investigate if publicly available whole genome and whole transcriptome sequencing data generated by large public cancer genome efforts, like The Cancer Genome Atlas (TCGA), could be used to identify bacteria associated with cancer. The Burrows-Wheeler aligner (BWA) was used to align a subset of Illumina paired-end sequencing data from TCGA to the human reference genome and all complete bacterial genomes in the RefSeq database in an effort to identify bacterial read pairs from the microbiome. RESULTS: Through careful consideration of all of the bacterial taxa present in the cancer types investigated, their relative abundance, and batch effects, we were able to identify some read pairs from certain taxa as likely resulting from contamination. In particular, the presence of Mycobacterium tuberculosis complex in the ovarian serous cystadenocarcinoma (OV) and glioblastoma multiforme (GBM) samples was correlated with the sequencing center of the samples. Additionally, there was a correlation between the presence of Ralstonia spp. and two specific plates of acute myeloid leukemia (AML) samples. At the end, associations remained between Pseudomonas-like and Acinetobacter-like read pairs in AML, and Pseudomonas-like read pairs in stomach adenocarcinoma (STAD) that could not be explained through batch effects or systematic contamination as seen in other samples. CONCLUSIONS: This approach suggests that it is possible to identify bacteria that may be present in human tumor samples from public genome sequencing data that can be examined further experimentally. More weight should be given to this approach in the future when bacterial associations with diseases are suspected. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s40168-016-0224-8) contains supplementary material, which is available to authorized users. BioMed Central 2017-01-25 /pmc/articles/PMC5264480/ /pubmed/28118849 http://dx.doi.org/10.1186/s40168-016-0224-8 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Robinson, Kelly M. Crabtree, Jonathan Mattick, John S. A. Anderson, Kathleen E. Dunning Hotopp, Julie C. Distinguishing potential bacteria-tumor associations from contamination in a secondary data analysis of public cancer genome sequence data
title	Distinguishing potential bacteria-tumor associations from contamination in a secondary data analysis of public cancer genome sequence data
title_full	Distinguishing potential bacteria-tumor associations from contamination in a secondary data analysis of public cancer genome sequence data
title_fullStr	Distinguishing potential bacteria-tumor associations from contamination in a secondary data analysis of public cancer genome sequence data
title_full_unstemmed	Distinguishing potential bacteria-tumor associations from contamination in a secondary data analysis of public cancer genome sequence data
title_short	Distinguishing potential bacteria-tumor associations from contamination in a secondary data analysis of public cancer genome sequence data
title_sort	distinguishing potential bacteria-tumor associations from contamination in a secondary data analysis of public cancer genome sequence data
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5264480/ https://www.ncbi.nlm.nih.gov/pubmed/28118849 http://dx.doi.org/10.1186/s40168-016-0224-8
work_keys_str_mv	AT robinsonkellym distinguishingpotentialbacteriatumorassociationsfromcontaminationinasecondarydataanalysisofpubliccancergenomesequencedata AT crabtreejonathan distinguishingpotentialbacteriatumorassociationsfromcontaminationinasecondarydataanalysisofpubliccancergenomesequencedata AT mattickjohnsa distinguishingpotentialbacteriatumorassociationsfromcontaminationinasecondarydataanalysisofpubliccancergenomesequencedata AT andersonkathleene distinguishingpotentialbacteriatumorassociationsfromcontaminationinasecondarydataanalysisofpubliccancergenomesequencedata AT dunninghotoppjuliec distinguishingpotentialbacteriatumorassociationsfromcontaminationinasecondarydataanalysisofpubliccancergenomesequencedata

Distinguishing potential bacteria-tumor associations from contamination in a secondary data analysis of public cancer genome sequence data

Ejemplares similares