Cargando…

Virus expression detection reveals RNA-sequencing contamination in TCGA

BACKGROUND: Contamination of reagents and cross contamination across samples is a long-recognized issue in molecular biology laboratories. While often innocuous, contamination can lead to inaccurate results. Cantalupo et al., for example, found HeLa-derived human papillomavirus 18 (H-HPV18) in sever...

Descripción completa

Detalles Bibliográficos
Autores principales: Selitsky, Sara R., Marron, David, Hollern, Daniel, Mose, Lisle E., Hoadley, Katherine A., Jones, Corbin, Parker, Joel S., Dittmer, Dirk P., Perou, Charles M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6986043/
https://www.ncbi.nlm.nih.gov/pubmed/31992194
http://dx.doi.org/10.1186/s12864-020-6483-6
_version_ 1783491902618730496
author Selitsky, Sara R.
Marron, David
Hollern, Daniel
Mose, Lisle E.
Hoadley, Katherine A.
Jones, Corbin
Parker, Joel S.
Dittmer, Dirk P.
Perou, Charles M.
author_facet Selitsky, Sara R.
Marron, David
Hollern, Daniel
Mose, Lisle E.
Hoadley, Katherine A.
Jones, Corbin
Parker, Joel S.
Dittmer, Dirk P.
Perou, Charles M.
author_sort Selitsky, Sara R.
collection PubMed
description BACKGROUND: Contamination of reagents and cross contamination across samples is a long-recognized issue in molecular biology laboratories. While often innocuous, contamination can lead to inaccurate results. Cantalupo et al., for example, found HeLa-derived human papillomavirus 18 (H-HPV18) in several of The Cancer Genome Atlas (TCGA) RNA-sequencing samples. This work motivated us to assess a greater number of samples and determine the origin of possible contaminations using viral sequences. To detect viruses with high specificity, we developed the publicly available workflow, VirDetect, that detects virus and laboratory vector sequences in RNA-seq samples. We applied VirDetect to 9143 RNA-seq samples sequenced at one TCGA sequencing center (28/33 cancer types) over 5 years. RESULTS: We confirmed that H-HPV18 was present in many samples and determined that viral transcripts from H-HPV18 significantly co-occurred with those from xenotropic mouse leukemia virus-related virus (XMRV). Using laboratory metadata and viral transcription, we determined that the likely contaminant was a pool of cell lines known as the “common reference”, which was sequenced alongside TCGA RNA-seq samples as a control to monitor quality across technology transitions (i.e. microarray to GAII to HiSeq), and to link RNA-seq to previous generation microarrays that standardly used the “common reference”. One of the cell lines in the pool was a laboratory isolate of MCF-7, which we discovered was infected with XMRV; another constituent of the pool was likely HeLa cells. CONCLUSIONS: Altogether, this indicates a multi-step contamination process. First, MCF-7 was infected with an XMRV. Second, this infected cell line was added to a pool of cell lines, which contained HeLa. Finally, RNA from this pool of cell lines contaminated several TCGA tumor samples most-likely during library construction. Thus, these human tumors with H-HPV or XMRV reads were likely not infected with H-HPV 18 or XMRV.
format Online
Article
Text
id pubmed-6986043
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-69860432020-01-30 Virus expression detection reveals RNA-sequencing contamination in TCGA Selitsky, Sara R. Marron, David Hollern, Daniel Mose, Lisle E. Hoadley, Katherine A. Jones, Corbin Parker, Joel S. Dittmer, Dirk P. Perou, Charles M. BMC Genomics Research Article BACKGROUND: Contamination of reagents and cross contamination across samples is a long-recognized issue in molecular biology laboratories. While often innocuous, contamination can lead to inaccurate results. Cantalupo et al., for example, found HeLa-derived human papillomavirus 18 (H-HPV18) in several of The Cancer Genome Atlas (TCGA) RNA-sequencing samples. This work motivated us to assess a greater number of samples and determine the origin of possible contaminations using viral sequences. To detect viruses with high specificity, we developed the publicly available workflow, VirDetect, that detects virus and laboratory vector sequences in RNA-seq samples. We applied VirDetect to 9143 RNA-seq samples sequenced at one TCGA sequencing center (28/33 cancer types) over 5 years. RESULTS: We confirmed that H-HPV18 was present in many samples and determined that viral transcripts from H-HPV18 significantly co-occurred with those from xenotropic mouse leukemia virus-related virus (XMRV). Using laboratory metadata and viral transcription, we determined that the likely contaminant was a pool of cell lines known as the “common reference”, which was sequenced alongside TCGA RNA-seq samples as a control to monitor quality across technology transitions (i.e. microarray to GAII to HiSeq), and to link RNA-seq to previous generation microarrays that standardly used the “common reference”. One of the cell lines in the pool was a laboratory isolate of MCF-7, which we discovered was infected with XMRV; another constituent of the pool was likely HeLa cells. CONCLUSIONS: Altogether, this indicates a multi-step contamination process. First, MCF-7 was infected with an XMRV. Second, this infected cell line was added to a pool of cell lines, which contained HeLa. Finally, RNA from this pool of cell lines contaminated several TCGA tumor samples most-likely during library construction. Thus, these human tumors with H-HPV or XMRV reads were likely not infected with H-HPV 18 or XMRV. BioMed Central 2020-01-28 /pmc/articles/PMC6986043/ /pubmed/31992194 http://dx.doi.org/10.1186/s12864-020-6483-6 Text en © The Author(s). 2020 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Selitsky, Sara R.
Marron, David
Hollern, Daniel
Mose, Lisle E.
Hoadley, Katherine A.
Jones, Corbin
Parker, Joel S.
Dittmer, Dirk P.
Perou, Charles M.
Virus expression detection reveals RNA-sequencing contamination in TCGA
title Virus expression detection reveals RNA-sequencing contamination in TCGA
title_full Virus expression detection reveals RNA-sequencing contamination in TCGA
title_fullStr Virus expression detection reveals RNA-sequencing contamination in TCGA
title_full_unstemmed Virus expression detection reveals RNA-sequencing contamination in TCGA
title_short Virus expression detection reveals RNA-sequencing contamination in TCGA
title_sort virus expression detection reveals rna-sequencing contamination in tcga
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6986043/
https://www.ncbi.nlm.nih.gov/pubmed/31992194
http://dx.doi.org/10.1186/s12864-020-6483-6
work_keys_str_mv AT selitskysarar virusexpressiondetectionrevealsrnasequencingcontaminationintcga
AT marrondavid virusexpressiondetectionrevealsrnasequencingcontaminationintcga
AT hollerndaniel virusexpressiondetectionrevealsrnasequencingcontaminationintcga
AT moselislee virusexpressiondetectionrevealsrnasequencingcontaminationintcga
AT hoadleykatherinea virusexpressiondetectionrevealsrnasequencingcontaminationintcga
AT jonescorbin virusexpressiondetectionrevealsrnasequencingcontaminationintcga
AT parkerjoels virusexpressiondetectionrevealsrnasequencingcontaminationintcga
AT dittmerdirkp virusexpressiondetectionrevealsrnasequencingcontaminationintcga
AT peroucharlesm virusexpressiondetectionrevealsrnasequencingcontaminationintcga