Cargando…
Joint haplotype assembly and genotype calling via sequential Monte Carlo algorithm
BACKGROUND: Genetic variations predispose individuals to hereditary diseases, play important role in the development of complex diseases, and impact drug metabolism. The full information about the DNA variations in the genome of an individual is given by haplotypes, the ordered lists of single nucle...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4503296/ https://www.ncbi.nlm.nih.gov/pubmed/26178880 http://dx.doi.org/10.1186/s12859-015-0651-8 |
_version_ | 1782381284637540352 |
---|---|
author | Ahn, Soyeon Vikalo, Haris |
author_facet | Ahn, Soyeon Vikalo, Haris |
author_sort | Ahn, Soyeon |
collection | PubMed |
description | BACKGROUND: Genetic variations predispose individuals to hereditary diseases, play important role in the development of complex diseases, and impact drug metabolism. The full information about the DNA variations in the genome of an individual is given by haplotypes, the ordered lists of single nucleotide polymorphisms (SNPs) located on chromosomes. Affordable high-throughput DNA sequencing technologies enable routine acquisition of data needed for the assembly of single individual haplotypes. However, state-of-the-art high-throughput sequencing platforms generate data that is erroneous, which induces uncertainty in the SNP and genotype calling procedures and, ultimately, adversely affect the accuracy of haplotyping. When inferring haplotype phase information, the vast majority of the existing techniques for haplotype assembly assume that the genotype information is correct. This motivates the development of methods capable of joint genotype calling and haplotype assembly. RESULTS: We present a haplotype assembly algorithm, ParticleHap, that relies on a probabilistic description of the sequencing data to jointly infer genotypes and assemble the most likely haplotypes. Our method employs a deterministic sequential Monte Carlo algorithm that associates single nucleotide polymorphisms with haplotypes by exhaustively exploring all possible extensions of the partial haplotypes. The algorithm relies on genotype likelihoods rather than on often erroneously called genotypes, thus ensuring a more accurate assembly of the haplotypes. Results on both the 1000 Genomes Project experimental data as well as simulation studies demonstrate that the proposed approach enables highly accurate solutions to the haplotype assembly problem while being computationally efficient and scalable, generally outperforming existing methods in terms of both accuracy and speed. CONCLUSIONS: The developed probabilistic framework and sequential Monte Carlo algorithm enable joint haplotype assembly and genotyping in a computationally efficient manner. Our results demonstrate fast and highly accurate haplotype assembly aided by the re-examination of erroneously called genotypes. A C code implementation of ParticleHap will be available for download from https://sites.google.com/site/asynoeun/particlehap. |
format | Online Article Text |
id | pubmed-4503296 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-45032962015-07-16 Joint haplotype assembly and genotype calling via sequential Monte Carlo algorithm Ahn, Soyeon Vikalo, Haris BMC Bioinformatics Methodology Article BACKGROUND: Genetic variations predispose individuals to hereditary diseases, play important role in the development of complex diseases, and impact drug metabolism. The full information about the DNA variations in the genome of an individual is given by haplotypes, the ordered lists of single nucleotide polymorphisms (SNPs) located on chromosomes. Affordable high-throughput DNA sequencing technologies enable routine acquisition of data needed for the assembly of single individual haplotypes. However, state-of-the-art high-throughput sequencing platforms generate data that is erroneous, which induces uncertainty in the SNP and genotype calling procedures and, ultimately, adversely affect the accuracy of haplotyping. When inferring haplotype phase information, the vast majority of the existing techniques for haplotype assembly assume that the genotype information is correct. This motivates the development of methods capable of joint genotype calling and haplotype assembly. RESULTS: We present a haplotype assembly algorithm, ParticleHap, that relies on a probabilistic description of the sequencing data to jointly infer genotypes and assemble the most likely haplotypes. Our method employs a deterministic sequential Monte Carlo algorithm that associates single nucleotide polymorphisms with haplotypes by exhaustively exploring all possible extensions of the partial haplotypes. The algorithm relies on genotype likelihoods rather than on often erroneously called genotypes, thus ensuring a more accurate assembly of the haplotypes. Results on both the 1000 Genomes Project experimental data as well as simulation studies demonstrate that the proposed approach enables highly accurate solutions to the haplotype assembly problem while being computationally efficient and scalable, generally outperforming existing methods in terms of both accuracy and speed. CONCLUSIONS: The developed probabilistic framework and sequential Monte Carlo algorithm enable joint haplotype assembly and genotyping in a computationally efficient manner. Our results demonstrate fast and highly accurate haplotype assembly aided by the re-examination of erroneously called genotypes. A C code implementation of ParticleHap will be available for download from https://sites.google.com/site/asynoeun/particlehap. BioMed Central 2015-07-16 /pmc/articles/PMC4503296/ /pubmed/26178880 http://dx.doi.org/10.1186/s12859-015-0651-8 Text en © Ahn and Vikalo; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Article Ahn, Soyeon Vikalo, Haris Joint haplotype assembly and genotype calling via sequential Monte Carlo algorithm |
title | Joint haplotype assembly and genotype calling via sequential Monte Carlo algorithm |
title_full | Joint haplotype assembly and genotype calling via sequential Monte Carlo algorithm |
title_fullStr | Joint haplotype assembly and genotype calling via sequential Monte Carlo algorithm |
title_full_unstemmed | Joint haplotype assembly and genotype calling via sequential Monte Carlo algorithm |
title_short | Joint haplotype assembly and genotype calling via sequential Monte Carlo algorithm |
title_sort | joint haplotype assembly and genotype calling via sequential monte carlo algorithm |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4503296/ https://www.ncbi.nlm.nih.gov/pubmed/26178880 http://dx.doi.org/10.1186/s12859-015-0651-8 |
work_keys_str_mv | AT ahnsoyeon jointhaplotypeassemblyandgenotypecallingviasequentialmontecarloalgorithm AT vikaloharis jointhaplotypeassemblyandgenotypecallingviasequentialmontecarloalgorithm |