Cargando…

cDNA-detector: detection and removal of cDNA contamination in DNA sequencing libraries

BACKGROUND: Exogenous cDNA introduced into an experimental system, either intentionally or accidentally, can appear as added read coverage over that gene in next-generation sequencing libraries derived from this system. If not properly recognized and managed, this cross-contamination with exogenous...

Descripción completa

Detalles Bibliográficos
Autores principales: Qi, Meifang, Nayar, Utthara, Ludwig, Leif S., Wagle, Nikhil, Rheinbay, Esther
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8709999/
https://www.ncbi.nlm.nih.gov/pubmed/34952565
http://dx.doi.org/10.1186/s12859-021-04529-2
_version_ 1784623065573883904
author Qi, Meifang
Nayar, Utthara
Ludwig, Leif S.
Wagle, Nikhil
Rheinbay, Esther
author_facet Qi, Meifang
Nayar, Utthara
Ludwig, Leif S.
Wagle, Nikhil
Rheinbay, Esther
author_sort Qi, Meifang
collection PubMed
description BACKGROUND: Exogenous cDNA introduced into an experimental system, either intentionally or accidentally, can appear as added read coverage over that gene in next-generation sequencing libraries derived from this system. If not properly recognized and managed, this cross-contamination with exogenous signal can lead to incorrect interpretation of research results. Yet, this problem is not routinely addressed in current sequence processing pipelines. RESULTS: We present cDNA-detector, a computational tool to identify and remove exogenous cDNA contamination in DNA sequencing experiments. We demonstrate that cDNA-detector can identify cDNAs quickly and accurately from alignment files. A source inference step attempts to separate endogenous cDNAs (retrocopied genes) from potential cloned, exogenous cDNAs. cDNA-detector provides a mechanism to decontaminate the alignment from detected cDNAs. Simulation studies show that cDNA-detector is highly sensitive and specific, outperforming existing tools. We apply cDNA-detector to several highly-cited public databases (TCGA, ENCODE, NCBI SRA) and show that contaminant genes appear in sequencing experiments where they lead to incorrect coverage peak calls. CONCLUSIONS: cDNA-detector is a user-friendly and accurate tool to detect and remove cDNA detection in NGS libraries. This two-step design reduces the risk of true variant removal since it allows for manual review of candidates. We find that contamination with intentionally and accidentally introduced cDNAs is an underappreciated problem even in widely-used consortium datasets, where it can lead to spurious results. Our findings highlight the importance of sensitive detection and removal of contaminant cDNA from NGS libraries before downstream analysis. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04529-2.
format Online
Article
Text
id pubmed-8709999
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-87099992022-01-05 cDNA-detector: detection and removal of cDNA contamination in DNA sequencing libraries Qi, Meifang Nayar, Utthara Ludwig, Leif S. Wagle, Nikhil Rheinbay, Esther BMC Bioinformatics Software BACKGROUND: Exogenous cDNA introduced into an experimental system, either intentionally or accidentally, can appear as added read coverage over that gene in next-generation sequencing libraries derived from this system. If not properly recognized and managed, this cross-contamination with exogenous signal can lead to incorrect interpretation of research results. Yet, this problem is not routinely addressed in current sequence processing pipelines. RESULTS: We present cDNA-detector, a computational tool to identify and remove exogenous cDNA contamination in DNA sequencing experiments. We demonstrate that cDNA-detector can identify cDNAs quickly and accurately from alignment files. A source inference step attempts to separate endogenous cDNAs (retrocopied genes) from potential cloned, exogenous cDNAs. cDNA-detector provides a mechanism to decontaminate the alignment from detected cDNAs. Simulation studies show that cDNA-detector is highly sensitive and specific, outperforming existing tools. We apply cDNA-detector to several highly-cited public databases (TCGA, ENCODE, NCBI SRA) and show that contaminant genes appear in sequencing experiments where they lead to incorrect coverage peak calls. CONCLUSIONS: cDNA-detector is a user-friendly and accurate tool to detect and remove cDNA detection in NGS libraries. This two-step design reduces the risk of true variant removal since it allows for manual review of candidates. We find that contamination with intentionally and accidentally introduced cDNAs is an underappreciated problem even in widely-used consortium datasets, where it can lead to spurious results. Our findings highlight the importance of sensitive detection and removal of contaminant cDNA from NGS libraries before downstream analysis. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04529-2. BioMed Central 2021-12-24 /pmc/articles/PMC8709999/ /pubmed/34952565 http://dx.doi.org/10.1186/s12859-021-04529-2 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Software
Qi, Meifang
Nayar, Utthara
Ludwig, Leif S.
Wagle, Nikhil
Rheinbay, Esther
cDNA-detector: detection and removal of cDNA contamination in DNA sequencing libraries
title cDNA-detector: detection and removal of cDNA contamination in DNA sequencing libraries
title_full cDNA-detector: detection and removal of cDNA contamination in DNA sequencing libraries
title_fullStr cDNA-detector: detection and removal of cDNA contamination in DNA sequencing libraries
title_full_unstemmed cDNA-detector: detection and removal of cDNA contamination in DNA sequencing libraries
title_short cDNA-detector: detection and removal of cDNA contamination in DNA sequencing libraries
title_sort cdna-detector: detection and removal of cdna contamination in dna sequencing libraries
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8709999/
https://www.ncbi.nlm.nih.gov/pubmed/34952565
http://dx.doi.org/10.1186/s12859-021-04529-2
work_keys_str_mv AT qimeifang cdnadetectordetectionandremovalofcdnacontaminationindnasequencinglibraries
AT nayarutthara cdnadetectordetectionandremovalofcdnacontaminationindnasequencinglibraries
AT ludwigleifs cdnadetectordetectionandremovalofcdnacontaminationindnasequencinglibraries
AT waglenikhil cdnadetectordetectionandremovalofcdnacontaminationindnasequencinglibraries
AT rheinbayesther cdnadetectordetectionandremovalofcdnacontaminationindnasequencinglibraries