Cargando…

DeepSNVMiner: a sequence analysis tool to detect emergent, rare mutations in subsets of cell populations

Background. Massively parallel sequencing technology is being used to sequence highly diverse populations of DNA such as that derived from heterogeneous cell mixtures containing both wild-type and disease-related states. At the core of such molecule tagging techniques is the tagging and identificati...

Descripción completa

Detalles Bibliográficos
Autores principales:	Andrews, T. Daniel, Jeelall, Yogesh, Talaulikar, Dipti, Goodnow, Christopher C., Field, Matthew A.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	PeerJ Inc. 2016
Materias:	Bioinformatics
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4888318/ https://www.ncbi.nlm.nih.gov/pubmed/27257550 http://dx.doi.org/10.7717/peerj.2074

_version_	1782434844930736128
author	Andrews, T. Daniel Jeelall, Yogesh Talaulikar, Dipti Goodnow, Christopher C. Field, Matthew A.
author_facet	Andrews, T. Daniel Jeelall, Yogesh Talaulikar, Dipti Goodnow, Christopher C. Field, Matthew A.
author_sort	Andrews, T. Daniel
collection	PubMed
description	Background. Massively parallel sequencing technology is being used to sequence highly diverse populations of DNA such as that derived from heterogeneous cell mixtures containing both wild-type and disease-related states. At the core of such molecule tagging techniques is the tagging and identification of sequence reads derived from individual input DNA molecules, which must be first computationally disambiguated to generate read groups sharing common sequence tags, with each read group representing a single input DNA molecule. This disambiguation typically generates huge numbers of reads groups, each of which requires additional variant detection analysis steps to be run specific to each read group, thus representing a significant computational challenge. While sequencing technologies for producing these data are approaching maturity, the lack of available computational tools for analysing such heterogeneous sequence data represents an obstacle to the widespread adoption of this technology. Results. Using synthetic data we successfully detect unique variants at dilution levels of 1 in a 1,000,000 molecules, and find DeeepSNVMiner obtains significantly lower false positive and false negative rates compared to popular variant callers GATK, SAMTools, FreeBayes and LoFreq, particularly as the variant concentration levels decrease. In a dilution series with genomic DNA from two cells lines, we find DeepSNVMiner identifies a known somatic variant when present at concentrations of only 1 in 1,000 molecules in the input material, the lowest concentration amongst all variant callers tested. Conclusions. Here we present DeepSNVMiner; a tool to disambiguate tagged sequence groups and robustly identify sequence variants specific to subsets of starting DNA molecules that may indicate the presence of a disease. DeepSNVMiner is an automated workflow of custom sequence analysis utilities and open source tools able to differentiate somatic DNA variants from artefactual sequence variants that likely arose during DNA amplification. The workflow remains flexible such that it may be customised to variants of the data production protocol used, and supports reproducible analysis through detailed logging and reporting of results. DeepSNVMiner is available for academic non-commercial research purposes at https://github.com/mattmattmattmatt/DeepSNVMiner.
format	Online Article Text
id	pubmed-4888318
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	PeerJ Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-48883182016-06-02 DeepSNVMiner: a sequence analysis tool to detect emergent, rare mutations in subsets of cell populations Andrews, T. Daniel Jeelall, Yogesh Talaulikar, Dipti Goodnow, Christopher C. Field, Matthew A. PeerJ Bioinformatics Background. Massively parallel sequencing technology is being used to sequence highly diverse populations of DNA such as that derived from heterogeneous cell mixtures containing both wild-type and disease-related states. At the core of such molecule tagging techniques is the tagging and identification of sequence reads derived from individual input DNA molecules, which must be first computationally disambiguated to generate read groups sharing common sequence tags, with each read group representing a single input DNA molecule. This disambiguation typically generates huge numbers of reads groups, each of which requires additional variant detection analysis steps to be run specific to each read group, thus representing a significant computational challenge. While sequencing technologies for producing these data are approaching maturity, the lack of available computational tools for analysing such heterogeneous sequence data represents an obstacle to the widespread adoption of this technology. Results. Using synthetic data we successfully detect unique variants at dilution levels of 1 in a 1,000,000 molecules, and find DeeepSNVMiner obtains significantly lower false positive and false negative rates compared to popular variant callers GATK, SAMTools, FreeBayes and LoFreq, particularly as the variant concentration levels decrease. In a dilution series with genomic DNA from two cells lines, we find DeepSNVMiner identifies a known somatic variant when present at concentrations of only 1 in 1,000 molecules in the input material, the lowest concentration amongst all variant callers tested. Conclusions. Here we present DeepSNVMiner; a tool to disambiguate tagged sequence groups and robustly identify sequence variants specific to subsets of starting DNA molecules that may indicate the presence of a disease. DeepSNVMiner is an automated workflow of custom sequence analysis utilities and open source tools able to differentiate somatic DNA variants from artefactual sequence variants that likely arose during DNA amplification. The workflow remains flexible such that it may be customised to variants of the data production protocol used, and supports reproducible analysis through detailed logging and reporting of results. DeepSNVMiner is available for academic non-commercial research purposes at https://github.com/mattmattmattmatt/DeepSNVMiner. PeerJ Inc. 2016-05-24 /pmc/articles/PMC4888318/ /pubmed/27257550 http://dx.doi.org/10.7717/peerj.2074 Text en ©2016 Andrews et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle	Bioinformatics Andrews, T. Daniel Jeelall, Yogesh Talaulikar, Dipti Goodnow, Christopher C. Field, Matthew A. DeepSNVMiner: a sequence analysis tool to detect emergent, rare mutations in subsets of cell populations
title	DeepSNVMiner: a sequence analysis tool to detect emergent, rare mutations in subsets of cell populations
title_full	DeepSNVMiner: a sequence analysis tool to detect emergent, rare mutations in subsets of cell populations
title_fullStr	DeepSNVMiner: a sequence analysis tool to detect emergent, rare mutations in subsets of cell populations
title_full_unstemmed	DeepSNVMiner: a sequence analysis tool to detect emergent, rare mutations in subsets of cell populations
title_short	DeepSNVMiner: a sequence analysis tool to detect emergent, rare mutations in subsets of cell populations
title_sort	deepsnvminer: a sequence analysis tool to detect emergent, rare mutations in subsets of cell populations
topic	Bioinformatics
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4888318/ https://www.ncbi.nlm.nih.gov/pubmed/27257550 http://dx.doi.org/10.7717/peerj.2074
work_keys_str_mv	AT andrewstdaniel deepsnvminerasequenceanalysistooltodetectemergentraremutationsinsubsetsofcellpopulations AT jeelallyogesh deepsnvminerasequenceanalysistooltodetectemergentraremutationsinsubsetsofcellpopulations AT talaulikardipti deepsnvminerasequenceanalysistooltodetectemergentraremutationsinsubsetsofcellpopulations AT goodnowchristopherc deepsnvminerasequenceanalysistooltodetectemergentraremutationsinsubsetsofcellpopulations AT fieldmatthewa deepsnvminerasequenceanalysistooltodetectemergentraremutationsinsubsetsofcellpopulations

DeepSNVMiner: a sequence analysis tool to detect emergent, rare mutations in subsets of cell populations

Ejemplares similares