Cargando…

Haplotype assembly in polyploid genomes and identical by descent shared tracts

Motivation: Genome-wide haplotype reconstruction from sequence data, or haplotype assembly, is at the center of major challenges in molecular biology and life sciences. For complex eukaryotic organisms like humans, the genome is vast and the population samples are growing so rapidly that algorithms...

Descripción completa

Detalles Bibliográficos
Autores principales: Aguiar, Derek, Istrail, Sorin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3694639/
https://www.ncbi.nlm.nih.gov/pubmed/23813004
http://dx.doi.org/10.1093/bioinformatics/btt213
_version_ 1782274879038423040
author Aguiar, Derek
Istrail, Sorin
author_facet Aguiar, Derek
Istrail, Sorin
author_sort Aguiar, Derek
collection PubMed
description Motivation: Genome-wide haplotype reconstruction from sequence data, or haplotype assembly, is at the center of major challenges in molecular biology and life sciences. For complex eukaryotic organisms like humans, the genome is vast and the population samples are growing so rapidly that algorithms processing high-throughput sequencing data must scale favorably in terms of both accuracy and computational efficiency. Furthermore, current models and methodologies for haplotype assembly (i) do not consider individuals sharing haplotypes jointly, which reduces the size and accuracy of assembled haplotypes, and (ii) are unable to model genomes having more than two sets of homologous chromosomes (polyploidy). Polyploid organisms are increasingly becoming the target of many research groups interested in the genomics of disease, phylogenetics, botany and evolution but there is an absence of theory and methods for polyploid haplotype reconstruction. Results: In this work, we present a number of results, extensions and generalizations of compass graphs and our HapCompass framework. We prove the theoretical complexity of two haplotype assembly optimizations, thereby motivating the use of heuristics. Furthermore, we present graph theory–based algorithms for the problem of haplotype assembly using our previously developed HapCompass framework for (i) novel implementations of haplotype assembly optimizations (minimum error correction), (ii) assembly of a pair of individuals sharing a haplotype tract identical by descent and (iii) assembly of polyploid genomes. We evaluate our methods on 1000 Genomes Project, Pacific Biosciences and simulated sequence data. Availability and Implementation: HapCompass is available for download at http://www.brown.edu/Research/Istrail_Lab/. Contact: Sorin_Istrail@brown.edu Supplementary information: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-3694639
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-36946392013-06-27 Haplotype assembly in polyploid genomes and identical by descent shared tracts Aguiar, Derek Istrail, Sorin Bioinformatics Ismb/Eccb 2013 Proceedings Papers Committee July 21 to July 23, 2013, Berlin, Germany Motivation: Genome-wide haplotype reconstruction from sequence data, or haplotype assembly, is at the center of major challenges in molecular biology and life sciences. For complex eukaryotic organisms like humans, the genome is vast and the population samples are growing so rapidly that algorithms processing high-throughput sequencing data must scale favorably in terms of both accuracy and computational efficiency. Furthermore, current models and methodologies for haplotype assembly (i) do not consider individuals sharing haplotypes jointly, which reduces the size and accuracy of assembled haplotypes, and (ii) are unable to model genomes having more than two sets of homologous chromosomes (polyploidy). Polyploid organisms are increasingly becoming the target of many research groups interested in the genomics of disease, phylogenetics, botany and evolution but there is an absence of theory and methods for polyploid haplotype reconstruction. Results: In this work, we present a number of results, extensions and generalizations of compass graphs and our HapCompass framework. We prove the theoretical complexity of two haplotype assembly optimizations, thereby motivating the use of heuristics. Furthermore, we present graph theory–based algorithms for the problem of haplotype assembly using our previously developed HapCompass framework for (i) novel implementations of haplotype assembly optimizations (minimum error correction), (ii) assembly of a pair of individuals sharing a haplotype tract identical by descent and (iii) assembly of polyploid genomes. We evaluate our methods on 1000 Genomes Project, Pacific Biosciences and simulated sequence data. Availability and Implementation: HapCompass is available for download at http://www.brown.edu/Research/Istrail_Lab/. Contact: Sorin_Istrail@brown.edu Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2013-07-01 2013-06-19 /pmc/articles/PMC3694639/ /pubmed/23813004 http://dx.doi.org/10.1093/bioinformatics/btt213 Text en © The Author 2013. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Ismb/Eccb 2013 Proceedings Papers Committee July 21 to July 23, 2013, Berlin, Germany
Aguiar, Derek
Istrail, Sorin
Haplotype assembly in polyploid genomes and identical by descent shared tracts
title Haplotype assembly in polyploid genomes and identical by descent shared tracts
title_full Haplotype assembly in polyploid genomes and identical by descent shared tracts
title_fullStr Haplotype assembly in polyploid genomes and identical by descent shared tracts
title_full_unstemmed Haplotype assembly in polyploid genomes and identical by descent shared tracts
title_short Haplotype assembly in polyploid genomes and identical by descent shared tracts
title_sort haplotype assembly in polyploid genomes and identical by descent shared tracts
topic Ismb/Eccb 2013 Proceedings Papers Committee July 21 to July 23, 2013, Berlin, Germany
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3694639/
https://www.ncbi.nlm.nih.gov/pubmed/23813004
http://dx.doi.org/10.1093/bioinformatics/btt213
work_keys_str_mv AT aguiarderek haplotypeassemblyinpolyploidgenomesandidenticalbydescentsharedtracts
AT istrailsorin haplotypeassemblyinpolyploidgenomesandidenticalbydescentsharedtracts