Cargando…
SNPest: a probabilistic graphical model for estimating genotypes
BACKGROUND: As the use of next-generation sequencing technologies is becoming more widespread, the need for robust software to help with the analysis is growing as well. A key challenge when analyzing sequencing data is the prediction of genotypes from the reads, i.e. correct inference of the underl...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4203901/ https://www.ncbi.nlm.nih.gov/pubmed/25294605 http://dx.doi.org/10.1186/1756-0500-7-698 |
_version_ | 1782340459195006976 |
---|---|
author | Lindgreen, Stinus Krogh, Anders Pedersen, Jakob Skou |
author_facet | Lindgreen, Stinus Krogh, Anders Pedersen, Jakob Skou |
author_sort | Lindgreen, Stinus |
collection | PubMed |
description | BACKGROUND: As the use of next-generation sequencing technologies is becoming more widespread, the need for robust software to help with the analysis is growing as well. A key challenge when analyzing sequencing data is the prediction of genotypes from the reads, i.e. correct inference of the underlying DNA sequences that gave rise to the sequenced fragments. For diploid organisms, the genotyper should be able to predict both alleles in the individual. Variations between the individual and the population can then be analyzed by looking for SNPs (single nucleotide polymorphisms) in order to investigate diseases or phenotypic features. To perform robust and high confidence genotyping and SNP calling, methods are needed that take the technology specific limitations into account and can model different sources of error. As an example, ancient DNA poses special challenges as the data is often shallow and subject to errors induced by post mortem damage. FINDINGS: We present a novel approach to the genotyping problem where a probabilistic framework describing the process from sampling to sequencing is implemented as a graphical model. This makes it possible to model technology specific errors and other sources of variation that can affect the result. The inferred genotype is given a posterior probability to signify the confidence in the result. SNPest has already been used to genotype large scale projects such as the first ancient human genome published in 2010. CONCLUSIONS: We compare the performance of SNPest to a number of other widely used genotypers on both real and simulated data, covering both haploid and diploid genomes. We investigate the effects of read depth, of removing adapters before mapping and genotyping, of using different mapping tools, and of using the correct model in the genotyping process. We show that the performance of SNPest is comparable to existing methods, and we also illustrate cases where SNPest has an advantage over other methods, e.g. when dealing with simulated ancient DNA. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1756-0500-7-698) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4203901 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-42039012014-10-22 SNPest: a probabilistic graphical model for estimating genotypes Lindgreen, Stinus Krogh, Anders Pedersen, Jakob Skou BMC Res Notes Technical Note BACKGROUND: As the use of next-generation sequencing technologies is becoming more widespread, the need for robust software to help with the analysis is growing as well. A key challenge when analyzing sequencing data is the prediction of genotypes from the reads, i.e. correct inference of the underlying DNA sequences that gave rise to the sequenced fragments. For diploid organisms, the genotyper should be able to predict both alleles in the individual. Variations between the individual and the population can then be analyzed by looking for SNPs (single nucleotide polymorphisms) in order to investigate diseases or phenotypic features. To perform robust and high confidence genotyping and SNP calling, methods are needed that take the technology specific limitations into account and can model different sources of error. As an example, ancient DNA poses special challenges as the data is often shallow and subject to errors induced by post mortem damage. FINDINGS: We present a novel approach to the genotyping problem where a probabilistic framework describing the process from sampling to sequencing is implemented as a graphical model. This makes it possible to model technology specific errors and other sources of variation that can affect the result. The inferred genotype is given a posterior probability to signify the confidence in the result. SNPest has already been used to genotype large scale projects such as the first ancient human genome published in 2010. CONCLUSIONS: We compare the performance of SNPest to a number of other widely used genotypers on both real and simulated data, covering both haploid and diploid genomes. We investigate the effects of read depth, of removing adapters before mapping and genotyping, of using different mapping tools, and of using the correct model in the genotyping process. We show that the performance of SNPest is comparable to existing methods, and we also illustrate cases where SNPest has an advantage over other methods, e.g. when dealing with simulated ancient DNA. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1756-0500-7-698) contains supplementary material, which is available to authorized users. BioMed Central 2014-10-07 /pmc/articles/PMC4203901/ /pubmed/25294605 http://dx.doi.org/10.1186/1756-0500-7-698 Text en © Lindgreen et al.; licensee BioMed Central Ltd. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Technical Note Lindgreen, Stinus Krogh, Anders Pedersen, Jakob Skou SNPest: a probabilistic graphical model for estimating genotypes |
title | SNPest: a probabilistic graphical model for estimating genotypes |
title_full | SNPest: a probabilistic graphical model for estimating genotypes |
title_fullStr | SNPest: a probabilistic graphical model for estimating genotypes |
title_full_unstemmed | SNPest: a probabilistic graphical model for estimating genotypes |
title_short | SNPest: a probabilistic graphical model for estimating genotypes |
title_sort | snpest: a probabilistic graphical model for estimating genotypes |
topic | Technical Note |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4203901/ https://www.ncbi.nlm.nih.gov/pubmed/25294605 http://dx.doi.org/10.1186/1756-0500-7-698 |
work_keys_str_mv | AT lindgreenstinus snpestaprobabilisticgraphicalmodelforestimatinggenotypes AT kroghanders snpestaprobabilisticgraphicalmodelforestimatinggenotypes AT pedersenjakobskou snpestaprobabilisticgraphicalmodelforestimatinggenotypes |