Cargando…
The allele distribution in next-generation sequencing data sets is accurately described as the result of a stochastic branching process
With the availability of next-generation sequencing (NGS) technology, it is expected that sequence variants may be called on a genomic scale. Here, we demonstrate that a deeper understanding of the distribution of the variant call frequencies at heterozygous loci in NGS data sets is a prerequisite f...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3315291/ https://www.ncbi.nlm.nih.gov/pubmed/22127862 http://dx.doi.org/10.1093/nar/gkr1073 |
_version_ | 1782228207407202304 |
---|---|
author | Heinrich, Verena Stange, Jens Dickhaus, Thorsten Imkeller, Peter Krüger, Ulrike Bauer, Sebastian Mundlos, Stefan Robinson, Peter N. Hecht, Jochen Krawitz, Peter M. |
author_facet | Heinrich, Verena Stange, Jens Dickhaus, Thorsten Imkeller, Peter Krüger, Ulrike Bauer, Sebastian Mundlos, Stefan Robinson, Peter N. Hecht, Jochen Krawitz, Peter M. |
author_sort | Heinrich, Verena |
collection | PubMed |
description | With the availability of next-generation sequencing (NGS) technology, it is expected that sequence variants may be called on a genomic scale. Here, we demonstrate that a deeper understanding of the distribution of the variant call frequencies at heterozygous loci in NGS data sets is a prerequisite for sensitive variant detection. We model the crucial steps in an NGS protocol as a stochastic branching process and derive a mathematical framework for the expected distribution of alleles at heterozygous loci before measurement that is sequencing. We confirm our theoretical results by analyzing technical replicates of human exome data and demonstrate that the variance of allele frequencies at heterozygous loci is higher than expected by a simple binomial distribution. Due to this high variance, mutation callers relying on binomial distributed priors are less sensitive for heterozygous variants that deviate strongly from the expected mean frequency. Our results also indicate that error rates can be reduced to a greater degree by technical replicates than by increasing sequencing depth. |
format | Online Article Text |
id | pubmed-3315291 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-33152912012-03-30 The allele distribution in next-generation sequencing data sets is accurately described as the result of a stochastic branching process Heinrich, Verena Stange, Jens Dickhaus, Thorsten Imkeller, Peter Krüger, Ulrike Bauer, Sebastian Mundlos, Stefan Robinson, Peter N. Hecht, Jochen Krawitz, Peter M. Nucleic Acids Res Computational Biology With the availability of next-generation sequencing (NGS) technology, it is expected that sequence variants may be called on a genomic scale. Here, we demonstrate that a deeper understanding of the distribution of the variant call frequencies at heterozygous loci in NGS data sets is a prerequisite for sensitive variant detection. We model the crucial steps in an NGS protocol as a stochastic branching process and derive a mathematical framework for the expected distribution of alleles at heterozygous loci before measurement that is sequencing. We confirm our theoretical results by analyzing technical replicates of human exome data and demonstrate that the variance of allele frequencies at heterozygous loci is higher than expected by a simple binomial distribution. Due to this high variance, mutation callers relying on binomial distributed priors are less sensitive for heterozygous variants that deviate strongly from the expected mean frequency. Our results also indicate that error rates can be reduced to a greater degree by technical replicates than by increasing sequencing depth. Oxford University Press 2012-03 2011-11-29 /pmc/articles/PMC3315291/ /pubmed/22127862 http://dx.doi.org/10.1093/nar/gkr1073 Text en © The Author(s) 2011. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Computational Biology Heinrich, Verena Stange, Jens Dickhaus, Thorsten Imkeller, Peter Krüger, Ulrike Bauer, Sebastian Mundlos, Stefan Robinson, Peter N. Hecht, Jochen Krawitz, Peter M. The allele distribution in next-generation sequencing data sets is accurately described as the result of a stochastic branching process |
title | The allele distribution in next-generation sequencing data sets is accurately described as the result of a stochastic branching process |
title_full | The allele distribution in next-generation sequencing data sets is accurately described as the result of a stochastic branching process |
title_fullStr | The allele distribution in next-generation sequencing data sets is accurately described as the result of a stochastic branching process |
title_full_unstemmed | The allele distribution in next-generation sequencing data sets is accurately described as the result of a stochastic branching process |
title_short | The allele distribution in next-generation sequencing data sets is accurately described as the result of a stochastic branching process |
title_sort | allele distribution in next-generation sequencing data sets is accurately described as the result of a stochastic branching process |
topic | Computational Biology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3315291/ https://www.ncbi.nlm.nih.gov/pubmed/22127862 http://dx.doi.org/10.1093/nar/gkr1073 |
work_keys_str_mv | AT heinrichverena thealleledistributioninnextgenerationsequencingdatasetsisaccuratelydescribedastheresultofastochasticbranchingprocess AT stangejens thealleledistributioninnextgenerationsequencingdatasetsisaccuratelydescribedastheresultofastochasticbranchingprocess AT dickhausthorsten thealleledistributioninnextgenerationsequencingdatasetsisaccuratelydescribedastheresultofastochasticbranchingprocess AT imkellerpeter thealleledistributioninnextgenerationsequencingdatasetsisaccuratelydescribedastheresultofastochasticbranchingprocess AT krugerulrike thealleledistributioninnextgenerationsequencingdatasetsisaccuratelydescribedastheresultofastochasticbranchingprocess AT bauersebastian thealleledistributioninnextgenerationsequencingdatasetsisaccuratelydescribedastheresultofastochasticbranchingprocess AT mundlosstefan thealleledistributioninnextgenerationsequencingdatasetsisaccuratelydescribedastheresultofastochasticbranchingprocess AT robinsonpetern thealleledistributioninnextgenerationsequencingdatasetsisaccuratelydescribedastheresultofastochasticbranchingprocess AT hechtjochen thealleledistributioninnextgenerationsequencingdatasetsisaccuratelydescribedastheresultofastochasticbranchingprocess AT krawitzpeterm thealleledistributioninnextgenerationsequencingdatasetsisaccuratelydescribedastheresultofastochasticbranchingprocess AT heinrichverena alleledistributioninnextgenerationsequencingdatasetsisaccuratelydescribedastheresultofastochasticbranchingprocess AT stangejens alleledistributioninnextgenerationsequencingdatasetsisaccuratelydescribedastheresultofastochasticbranchingprocess AT dickhausthorsten alleledistributioninnextgenerationsequencingdatasetsisaccuratelydescribedastheresultofastochasticbranchingprocess AT imkellerpeter alleledistributioninnextgenerationsequencingdatasetsisaccuratelydescribedastheresultofastochasticbranchingprocess AT krugerulrike alleledistributioninnextgenerationsequencingdatasetsisaccuratelydescribedastheresultofastochasticbranchingprocess AT bauersebastian alleledistributioninnextgenerationsequencingdatasetsisaccuratelydescribedastheresultofastochasticbranchingprocess AT mundlosstefan alleledistributioninnextgenerationsequencingdatasetsisaccuratelydescribedastheresultofastochasticbranchingprocess AT robinsonpetern alleledistributioninnextgenerationsequencingdatasetsisaccuratelydescribedastheresultofastochasticbranchingprocess AT hechtjochen alleledistributioninnextgenerationsequencingdatasetsisaccuratelydescribedastheresultofastochasticbranchingprocess AT krawitzpeterm alleledistributioninnextgenerationsequencingdatasetsisaccuratelydescribedastheresultofastochasticbranchingprocess |