Cargando…

Integrating read-based and population-based phasing for dense and accurate haplotyping of individual genomes

MOTIVATION: Reconstruction of haplotypes for human genomes is an important problem in medical and population genetics. Hi-C sequencing generates read pairs with long-range haplotype information that can be computationally assembled to generate chromosome-spanning haplotypes. However, the haplotypes...

Descripción completa

Detalles Bibliográficos
Autor principal:	Bansal, Vikas
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2019
Materias:	Ismb/Eccb 2019 Conference Proceedings
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6612846/ https://www.ncbi.nlm.nih.gov/pubmed/31510646 http://dx.doi.org/10.1093/bioinformatics/btz329

_version_	1783432950365290496
author	Bansal, Vikas
author_facet	Bansal, Vikas
author_sort	Bansal, Vikas
collection	PubMed
description	MOTIVATION: Reconstruction of haplotypes for human genomes is an important problem in medical and population genetics. Hi-C sequencing generates read pairs with long-range haplotype information that can be computationally assembled to generate chromosome-spanning haplotypes. However, the haplotypes have limited completeness and low accuracy. Haplotype information from population reference panels can potentially be used to improve the completeness and accuracy of Hi-C haplotyping. RESULTS: In this paper, we describe a likelihood based method to integrate short-range haplotype information from a population reference panel of haplotypes with the long-range haplotype information present in sequence reads from methods such as Hi-C to assemble dense and highly accurate haplotypes for individual genomes. Our method leverages a statistical phasing method and a maximum spanning tree algorithm to determine the optimal second-order approximation of the population-based haplotype likelihood for an individual genome. The population-based likelihood is encoded using pseudo-reads which are then used as input along with sequence reads for haplotype assembly using an existing tool, HapCUT2. Using whole-genome Hi-C data for two human genomes (NA19240 and NA12878), we demonstrate that this integrated phasing method enables the phasing of 97–98% of variants, reduces the switch error rates by 3–6-fold, and outperforms an existing method for combining phase information from sequence reads with population-based phasing. On Strand-seq data for NA12878, our method improves the haplotype completeness from 71.4 to 94.6% and reduces the switch error rate 2-fold, demonstrating its utility for phasing using multiple sequencing technologies. AVAILABILITY AND IMPLEMENTATION: Code and datasets are available at https://github.com/vibansal/IntegratedPhasing.
format	Online Article Text
id	pubmed-6612846
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-66128462019-07-12 Integrating read-based and population-based phasing for dense and accurate haplotyping of individual genomes Bansal, Vikas Bioinformatics Ismb/Eccb 2019 Conference Proceedings MOTIVATION: Reconstruction of haplotypes for human genomes is an important problem in medical and population genetics. Hi-C sequencing generates read pairs with long-range haplotype information that can be computationally assembled to generate chromosome-spanning haplotypes. However, the haplotypes have limited completeness and low accuracy. Haplotype information from population reference panels can potentially be used to improve the completeness and accuracy of Hi-C haplotyping. RESULTS: In this paper, we describe a likelihood based method to integrate short-range haplotype information from a population reference panel of haplotypes with the long-range haplotype information present in sequence reads from methods such as Hi-C to assemble dense and highly accurate haplotypes for individual genomes. Our method leverages a statistical phasing method and a maximum spanning tree algorithm to determine the optimal second-order approximation of the population-based haplotype likelihood for an individual genome. The population-based likelihood is encoded using pseudo-reads which are then used as input along with sequence reads for haplotype assembly using an existing tool, HapCUT2. Using whole-genome Hi-C data for two human genomes (NA19240 and NA12878), we demonstrate that this integrated phasing method enables the phasing of 97–98% of variants, reduces the switch error rates by 3–6-fold, and outperforms an existing method for combining phase information from sequence reads with population-based phasing. On Strand-seq data for NA12878, our method improves the haplotype completeness from 71.4 to 94.6% and reduces the switch error rate 2-fold, demonstrating its utility for phasing using multiple sequencing technologies. AVAILABILITY AND IMPLEMENTATION: Code and datasets are available at https://github.com/vibansal/IntegratedPhasing. Oxford University Press 2019-07 2019-07-05 /pmc/articles/PMC6612846/ /pubmed/31510646 http://dx.doi.org/10.1093/bioinformatics/btz329 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Ismb/Eccb 2019 Conference Proceedings Bansal, Vikas Integrating read-based and population-based phasing for dense and accurate haplotyping of individual genomes
title	Integrating read-based and population-based phasing for dense and accurate haplotyping of individual genomes
title_full	Integrating read-based and population-based phasing for dense and accurate haplotyping of individual genomes
title_fullStr	Integrating read-based and population-based phasing for dense and accurate haplotyping of individual genomes
title_full_unstemmed	Integrating read-based and population-based phasing for dense and accurate haplotyping of individual genomes
title_short	Integrating read-based and population-based phasing for dense and accurate haplotyping of individual genomes
title_sort	integrating read-based and population-based phasing for dense and accurate haplotyping of individual genomes
topic	Ismb/Eccb 2019 Conference Proceedings
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6612846/ https://www.ncbi.nlm.nih.gov/pubmed/31510646 http://dx.doi.org/10.1093/bioinformatics/btz329
work_keys_str_mv	AT bansalvikas integratingreadbasedandpopulationbasedphasingfordenseandaccuratehaplotypingofindividualgenomes

Integrating read-based and population-based phasing for dense and accurate haplotyping of individual genomes

Ejemplares similares