Cargando…

Efficient algorithms for polyploid haplotype phasing

BACKGROUND: Inference of haplotypes, or the sequence of alleles along the same chromosomes, is a fundamental problem in genetics and is a key component for many analyses including admixture mapping, identifying regions of identity by descent and imputation. Haplotype phasing based on sequencing read...

Descripción completa

Detalles Bibliográficos
Autores principales: He, Dan, Saha, Subrata, Finkers, Richard, Parida, Laxmi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5954289/
https://www.ncbi.nlm.nih.gov/pubmed/29764364
http://dx.doi.org/10.1186/s12864-018-4464-9
_version_ 1783323491549839360
author He, Dan
Saha, Subrata
Finkers, Richard
Parida, Laxmi
author_facet He, Dan
Saha, Subrata
Finkers, Richard
Parida, Laxmi
author_sort He, Dan
collection PubMed
description BACKGROUND: Inference of haplotypes, or the sequence of alleles along the same chromosomes, is a fundamental problem in genetics and is a key component for many analyses including admixture mapping, identifying regions of identity by descent and imputation. Haplotype phasing based on sequencing reads has attracted lots of attentions. Diploid haplotype phasing where the two haplotypes are complimentary have been studied extensively. In this work, we focused on Polyploid haplotype phasing where we aim to phase more than two haplotypes at the same time from sequencing data. The problem is much more complicated as the search space becomes much larger and the haplotypes do not need to be complimentary any more. RESULTS: We proposed two algorithms, (1) Poly-Harsh, a Gibbs Sampling based algorithm which alternatively samples haplotypes and the read assignments to minimize the mismatches between the reads and the phased haplotypes, (2) An efficient algorithm to concatenate haplotype blocks into contiguous haplotypes. CONCLUSIONS: Our experiments showed that our method is able to improve the quality of the phased haplotypes over the state-of-the-art methods. To our knowledge, our algorithm for haplotype blocks concatenation is the first algorithm that leverages the shared information across multiple individuals to construct contiguous haplotypes. Our experiments showed that it is both efficient and effective.
format Online
Article
Text
id pubmed-5954289
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-59542892018-05-21 Efficient algorithms for polyploid haplotype phasing He, Dan Saha, Subrata Finkers, Richard Parida, Laxmi BMC Genomics Research BACKGROUND: Inference of haplotypes, or the sequence of alleles along the same chromosomes, is a fundamental problem in genetics and is a key component for many analyses including admixture mapping, identifying regions of identity by descent and imputation. Haplotype phasing based on sequencing reads has attracted lots of attentions. Diploid haplotype phasing where the two haplotypes are complimentary have been studied extensively. In this work, we focused on Polyploid haplotype phasing where we aim to phase more than two haplotypes at the same time from sequencing data. The problem is much more complicated as the search space becomes much larger and the haplotypes do not need to be complimentary any more. RESULTS: We proposed two algorithms, (1) Poly-Harsh, a Gibbs Sampling based algorithm which alternatively samples haplotypes and the read assignments to minimize the mismatches between the reads and the phased haplotypes, (2) An efficient algorithm to concatenate haplotype blocks into contiguous haplotypes. CONCLUSIONS: Our experiments showed that our method is able to improve the quality of the phased haplotypes over the state-of-the-art methods. To our knowledge, our algorithm for haplotype blocks concatenation is the first algorithm that leverages the shared information across multiple individuals to construct contiguous haplotypes. Our experiments showed that it is both efficient and effective. BioMed Central 2018-05-09 /pmc/articles/PMC5954289/ /pubmed/29764364 http://dx.doi.org/10.1186/s12864-018-4464-9 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
He, Dan
Saha, Subrata
Finkers, Richard
Parida, Laxmi
Efficient algorithms for polyploid haplotype phasing
title Efficient algorithms for polyploid haplotype phasing
title_full Efficient algorithms for polyploid haplotype phasing
title_fullStr Efficient algorithms for polyploid haplotype phasing
title_full_unstemmed Efficient algorithms for polyploid haplotype phasing
title_short Efficient algorithms for polyploid haplotype phasing
title_sort efficient algorithms for polyploid haplotype phasing
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5954289/
https://www.ncbi.nlm.nih.gov/pubmed/29764364
http://dx.doi.org/10.1186/s12864-018-4464-9
work_keys_str_mv AT hedan efficientalgorithmsforpolyploidhaplotypephasing
AT sahasubrata efficientalgorithmsforpolyploidhaplotypephasing
AT finkersrichard efficientalgorithmsforpolyploidhaplotypephasing
AT paridalaxmi efficientalgorithmsforpolyploidhaplotypephasing