Cargando…
Virus expression detection reveals RNA-sequencing contamination in TCGA
BACKGROUND: Contamination of reagents and cross contamination across samples is a long-recognized issue in molecular biology laboratories. While often innocuous, contamination can lead to inaccurate results. Cantalupo et al., for example, found HeLa-derived human papillomavirus 18 (H-HPV18) in sever...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6986043/ https://www.ncbi.nlm.nih.gov/pubmed/31992194 http://dx.doi.org/10.1186/s12864-020-6483-6 |
_version_ | 1783491902618730496 |
---|---|
author | Selitsky, Sara R. Marron, David Hollern, Daniel Mose, Lisle E. Hoadley, Katherine A. Jones, Corbin Parker, Joel S. Dittmer, Dirk P. Perou, Charles M. |
author_facet | Selitsky, Sara R. Marron, David Hollern, Daniel Mose, Lisle E. Hoadley, Katherine A. Jones, Corbin Parker, Joel S. Dittmer, Dirk P. Perou, Charles M. |
author_sort | Selitsky, Sara R. |
collection | PubMed |
description | BACKGROUND: Contamination of reagents and cross contamination across samples is a long-recognized issue in molecular biology laboratories. While often innocuous, contamination can lead to inaccurate results. Cantalupo et al., for example, found HeLa-derived human papillomavirus 18 (H-HPV18) in several of The Cancer Genome Atlas (TCGA) RNA-sequencing samples. This work motivated us to assess a greater number of samples and determine the origin of possible contaminations using viral sequences. To detect viruses with high specificity, we developed the publicly available workflow, VirDetect, that detects virus and laboratory vector sequences in RNA-seq samples. We applied VirDetect to 9143 RNA-seq samples sequenced at one TCGA sequencing center (28/33 cancer types) over 5 years. RESULTS: We confirmed that H-HPV18 was present in many samples and determined that viral transcripts from H-HPV18 significantly co-occurred with those from xenotropic mouse leukemia virus-related virus (XMRV). Using laboratory metadata and viral transcription, we determined that the likely contaminant was a pool of cell lines known as the “common reference”, which was sequenced alongside TCGA RNA-seq samples as a control to monitor quality across technology transitions (i.e. microarray to GAII to HiSeq), and to link RNA-seq to previous generation microarrays that standardly used the “common reference”. One of the cell lines in the pool was a laboratory isolate of MCF-7, which we discovered was infected with XMRV; another constituent of the pool was likely HeLa cells. CONCLUSIONS: Altogether, this indicates a multi-step contamination process. First, MCF-7 was infected with an XMRV. Second, this infected cell line was added to a pool of cell lines, which contained HeLa. Finally, RNA from this pool of cell lines contaminated several TCGA tumor samples most-likely during library construction. Thus, these human tumors with H-HPV or XMRV reads were likely not infected with H-HPV 18 or XMRV. |
format | Online Article Text |
id | pubmed-6986043 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-69860432020-01-30 Virus expression detection reveals RNA-sequencing contamination in TCGA Selitsky, Sara R. Marron, David Hollern, Daniel Mose, Lisle E. Hoadley, Katherine A. Jones, Corbin Parker, Joel S. Dittmer, Dirk P. Perou, Charles M. BMC Genomics Research Article BACKGROUND: Contamination of reagents and cross contamination across samples is a long-recognized issue in molecular biology laboratories. While often innocuous, contamination can lead to inaccurate results. Cantalupo et al., for example, found HeLa-derived human papillomavirus 18 (H-HPV18) in several of The Cancer Genome Atlas (TCGA) RNA-sequencing samples. This work motivated us to assess a greater number of samples and determine the origin of possible contaminations using viral sequences. To detect viruses with high specificity, we developed the publicly available workflow, VirDetect, that detects virus and laboratory vector sequences in RNA-seq samples. We applied VirDetect to 9143 RNA-seq samples sequenced at one TCGA sequencing center (28/33 cancer types) over 5 years. RESULTS: We confirmed that H-HPV18 was present in many samples and determined that viral transcripts from H-HPV18 significantly co-occurred with those from xenotropic mouse leukemia virus-related virus (XMRV). Using laboratory metadata and viral transcription, we determined that the likely contaminant was a pool of cell lines known as the “common reference”, which was sequenced alongside TCGA RNA-seq samples as a control to monitor quality across technology transitions (i.e. microarray to GAII to HiSeq), and to link RNA-seq to previous generation microarrays that standardly used the “common reference”. One of the cell lines in the pool was a laboratory isolate of MCF-7, which we discovered was infected with XMRV; another constituent of the pool was likely HeLa cells. CONCLUSIONS: Altogether, this indicates a multi-step contamination process. First, MCF-7 was infected with an XMRV. Second, this infected cell line was added to a pool of cell lines, which contained HeLa. Finally, RNA from this pool of cell lines contaminated several TCGA tumor samples most-likely during library construction. Thus, these human tumors with H-HPV or XMRV reads were likely not infected with H-HPV 18 or XMRV. BioMed Central 2020-01-28 /pmc/articles/PMC6986043/ /pubmed/31992194 http://dx.doi.org/10.1186/s12864-020-6483-6 Text en © The Author(s). 2020 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Selitsky, Sara R. Marron, David Hollern, Daniel Mose, Lisle E. Hoadley, Katherine A. Jones, Corbin Parker, Joel S. Dittmer, Dirk P. Perou, Charles M. Virus expression detection reveals RNA-sequencing contamination in TCGA |
title | Virus expression detection reveals RNA-sequencing contamination in TCGA |
title_full | Virus expression detection reveals RNA-sequencing contamination in TCGA |
title_fullStr | Virus expression detection reveals RNA-sequencing contamination in TCGA |
title_full_unstemmed | Virus expression detection reveals RNA-sequencing contamination in TCGA |
title_short | Virus expression detection reveals RNA-sequencing contamination in TCGA |
title_sort | virus expression detection reveals rna-sequencing contamination in tcga |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6986043/ https://www.ncbi.nlm.nih.gov/pubmed/31992194 http://dx.doi.org/10.1186/s12864-020-6483-6 |
work_keys_str_mv | AT selitskysarar virusexpressiondetectionrevealsrnasequencingcontaminationintcga AT marrondavid virusexpressiondetectionrevealsrnasequencingcontaminationintcga AT hollerndaniel virusexpressiondetectionrevealsrnasequencingcontaminationintcga AT moselislee virusexpressiondetectionrevealsrnasequencingcontaminationintcga AT hoadleykatherinea virusexpressiondetectionrevealsrnasequencingcontaminationintcga AT jonescorbin virusexpressiondetectionrevealsrnasequencingcontaminationintcga AT parkerjoels virusexpressiondetectionrevealsrnasequencingcontaminationintcga AT dittmerdirkp virusexpressiondetectionrevealsrnasequencingcontaminationintcga AT peroucharlesm virusexpressiondetectionrevealsrnasequencingcontaminationintcga |