Cargando…

SNPest: a probabilistic graphical model for estimating genotypes

BACKGROUND: As the use of next-generation sequencing technologies is becoming more widespread, the need for robust software to help with the analysis is growing as well. A key challenge when analyzing sequencing data is the prediction of genotypes from the reads, i.e. correct inference of the underl...

Descripción completa

Detalles Bibliográficos
Autores principales:	Lindgreen, Stinus, Krogh, Anders, Pedersen, Jakob Skou
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2014
Materias:	Technical Note
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4203901/ https://www.ncbi.nlm.nih.gov/pubmed/25294605 http://dx.doi.org/10.1186/1756-0500-7-698

_version_	1782340459195006976
author	Lindgreen, Stinus Krogh, Anders Pedersen, Jakob Skou
author_facet	Lindgreen, Stinus Krogh, Anders Pedersen, Jakob Skou
author_sort	Lindgreen, Stinus
collection	PubMed
description	BACKGROUND: As the use of next-generation sequencing technologies is becoming more widespread, the need for robust software to help with the analysis is growing as well. A key challenge when analyzing sequencing data is the prediction of genotypes from the reads, i.e. correct inference of the underlying DNA sequences that gave rise to the sequenced fragments. For diploid organisms, the genotyper should be able to predict both alleles in the individual. Variations between the individual and the population can then be analyzed by looking for SNPs (single nucleotide polymorphisms) in order to investigate diseases or phenotypic features. To perform robust and high confidence genotyping and SNP calling, methods are needed that take the technology specific limitations into account and can model different sources of error. As an example, ancient DNA poses special challenges as the data is often shallow and subject to errors induced by post mortem damage. FINDINGS: We present a novel approach to the genotyping problem where a probabilistic framework describing the process from sampling to sequencing is implemented as a graphical model. This makes it possible to model technology specific errors and other sources of variation that can affect the result. The inferred genotype is given a posterior probability to signify the confidence in the result. SNPest has already been used to genotype large scale projects such as the first ancient human genome published in 2010. CONCLUSIONS: We compare the performance of SNPest to a number of other widely used genotypers on both real and simulated data, covering both haploid and diploid genomes. We investigate the effects of read depth, of removing adapters before mapping and genotyping, of using different mapping tools, and of using the correct model in the genotyping process. We show that the performance of SNPest is comparable to existing methods, and we also illustrate cases where SNPest has an advantage over other methods, e.g. when dealing with simulated ancient DNA. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1756-0500-7-698) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-4203901
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-42039012014-10-22 SNPest: a probabilistic graphical model for estimating genotypes Lindgreen, Stinus Krogh, Anders Pedersen, Jakob Skou BMC Res Notes Technical Note BACKGROUND: As the use of next-generation sequencing technologies is becoming more widespread, the need for robust software to help with the analysis is growing as well. A key challenge when analyzing sequencing data is the prediction of genotypes from the reads, i.e. correct inference of the underlying DNA sequences that gave rise to the sequenced fragments. For diploid organisms, the genotyper should be able to predict both alleles in the individual. Variations between the individual and the population can then be analyzed by looking for SNPs (single nucleotide polymorphisms) in order to investigate diseases or phenotypic features. To perform robust and high confidence genotyping and SNP calling, methods are needed that take the technology specific limitations into account and can model different sources of error. As an example, ancient DNA poses special challenges as the data is often shallow and subject to errors induced by post mortem damage. FINDINGS: We present a novel approach to the genotyping problem where a probabilistic framework describing the process from sampling to sequencing is implemented as a graphical model. This makes it possible to model technology specific errors and other sources of variation that can affect the result. The inferred genotype is given a posterior probability to signify the confidence in the result. SNPest has already been used to genotype large scale projects such as the first ancient human genome published in 2010. CONCLUSIONS: We compare the performance of SNPest to a number of other widely used genotypers on both real and simulated data, covering both haploid and diploid genomes. We investigate the effects of read depth, of removing adapters before mapping and genotyping, of using different mapping tools, and of using the correct model in the genotyping process. We show that the performance of SNPest is comparable to existing methods, and we also illustrate cases where SNPest has an advantage over other methods, e.g. when dealing with simulated ancient DNA. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1756-0500-7-698) contains supplementary material, which is available to authorized users. BioMed Central 2014-10-07 /pmc/articles/PMC4203901/ /pubmed/25294605 http://dx.doi.org/10.1186/1756-0500-7-698 Text en © Lindgreen et al.; licensee BioMed Central Ltd. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Technical Note Lindgreen, Stinus Krogh, Anders Pedersen, Jakob Skou SNPest: a probabilistic graphical model for estimating genotypes
title	SNPest: a probabilistic graphical model for estimating genotypes
title_full	SNPest: a probabilistic graphical model for estimating genotypes
title_fullStr	SNPest: a probabilistic graphical model for estimating genotypes
title_full_unstemmed	SNPest: a probabilistic graphical model for estimating genotypes
title_short	SNPest: a probabilistic graphical model for estimating genotypes
title_sort	snpest: a probabilistic graphical model for estimating genotypes
topic	Technical Note
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4203901/ https://www.ncbi.nlm.nih.gov/pubmed/25294605 http://dx.doi.org/10.1186/1756-0500-7-698
work_keys_str_mv	AT lindgreenstinus snpestaprobabilisticgraphicalmodelforestimatinggenotypes AT kroghanders snpestaprobabilisticgraphicalmodelforestimatinggenotypes AT pedersenjakobskou snpestaprobabilisticgraphicalmodelforestimatinggenotypes

SNPest: a probabilistic graphical model for estimating genotypes

Ejemplares similares