Cargando…

SDhaP: haplotype assembly for diploids and polyploids via semi-definite programming

BACKGROUND: The goal of haplotype assembly is to infer haplotypes of an individual from a mixture of sequenced chromosome fragments. Limited lengths of paired-end sequencing reads and inserts render haplotype assembly computationally challenging; in fact, most of the problem formulations are known t...

Descripción completa

Detalles Bibliográficos
Autores principales: Das, Shreepriya, Vikalo, Haris
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4422552/
https://www.ncbi.nlm.nih.gov/pubmed/25885901
http://dx.doi.org/10.1186/s12864-015-1408-5
_version_ 1782370072166137856
author Das, Shreepriya
Vikalo, Haris
author_facet Das, Shreepriya
Vikalo, Haris
author_sort Das, Shreepriya
collection PubMed
description BACKGROUND: The goal of haplotype assembly is to infer haplotypes of an individual from a mixture of sequenced chromosome fragments. Limited lengths of paired-end sequencing reads and inserts render haplotype assembly computationally challenging; in fact, most of the problem formulations are known to be NP-hard. Dimensions (and, therefore, difficulty) of the haplotype assembly problems keep increasing as the sequencing technology advances and the length of reads and inserts grow. The computational challenges are even more pronounced in the case of polyploid haplotypes, whose assembly is considerably more difficult than in the case of diploids. Fast, accurate, and scalable methods for haplotype assembly of diploid and polyploid organisms are needed. RESULTS: We develop a novel framework for diploid/polyploid haplotype assembly from high-throughput sequencing data. The method formulates the haplotype assembly problem as a semi-definite program and exploits its special structure – namely, the low rank of the underlying solution – to solve it rapidly and with high accuracy. The developed framework is applicable to both diploid and polyploid species. The code for SDhaP is freely available at https://sourceforge.net/projects/sdhap. CONCLUSION: Extensive benchmarking tests on both real and simulated data show that the proposed algorithms outperform several well-known haplotype assembly methods in terms of either accuracy or speed or both. Useful recommendations for coverages needed to achieve near-optimal solutions are also provided.
format Online
Article
Text
id pubmed-4422552
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-44225522015-05-07 SDhaP: haplotype assembly for diploids and polyploids via semi-definite programming Das, Shreepriya Vikalo, Haris BMC Genomics Methodology Article BACKGROUND: The goal of haplotype assembly is to infer haplotypes of an individual from a mixture of sequenced chromosome fragments. Limited lengths of paired-end sequencing reads and inserts render haplotype assembly computationally challenging; in fact, most of the problem formulations are known to be NP-hard. Dimensions (and, therefore, difficulty) of the haplotype assembly problems keep increasing as the sequencing technology advances and the length of reads and inserts grow. The computational challenges are even more pronounced in the case of polyploid haplotypes, whose assembly is considerably more difficult than in the case of diploids. Fast, accurate, and scalable methods for haplotype assembly of diploid and polyploid organisms are needed. RESULTS: We develop a novel framework for diploid/polyploid haplotype assembly from high-throughput sequencing data. The method formulates the haplotype assembly problem as a semi-definite program and exploits its special structure – namely, the low rank of the underlying solution – to solve it rapidly and with high accuracy. The developed framework is applicable to both diploid and polyploid species. The code for SDhaP is freely available at https://sourceforge.net/projects/sdhap. CONCLUSION: Extensive benchmarking tests on both real and simulated data show that the proposed algorithms outperform several well-known haplotype assembly methods in terms of either accuracy or speed or both. Useful recommendations for coverages needed to achieve near-optimal solutions are also provided. BioMed Central 2015-04-03 /pmc/articles/PMC4422552/ /pubmed/25885901 http://dx.doi.org/10.1186/s12864-015-1408-5 Text en © Das and Vikalo; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Das, Shreepriya
Vikalo, Haris
SDhaP: haplotype assembly for diploids and polyploids via semi-definite programming
title SDhaP: haplotype assembly for diploids and polyploids via semi-definite programming
title_full SDhaP: haplotype assembly for diploids and polyploids via semi-definite programming
title_fullStr SDhaP: haplotype assembly for diploids and polyploids via semi-definite programming
title_full_unstemmed SDhaP: haplotype assembly for diploids and polyploids via semi-definite programming
title_short SDhaP: haplotype assembly for diploids and polyploids via semi-definite programming
title_sort sdhap: haplotype assembly for diploids and polyploids via semi-definite programming
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4422552/
https://www.ncbi.nlm.nih.gov/pubmed/25885901
http://dx.doi.org/10.1186/s12864-015-1408-5
work_keys_str_mv AT dasshreepriya sdhaphaplotypeassemblyfordiploidsandpolyploidsviasemidefiniteprogramming
AT vikaloharis sdhaphaplotypeassemblyfordiploidsandpolyploidsviasemidefiniteprogramming