Cargando…

HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies

Many tools have been developed for haplotype assembly—the reconstruction of individual haplotypes using reads mapped to a reference genome sequence. Due to increasing interest in obtaining haplotype-resolved human genomes, a range of new sequencing protocols and technologies have been developed to e...

Descripción completa

Detalles Bibliográficos
Autores principales: Edge, Peter, Bafna, Vineet, Bansal, Vikas
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5411775/
https://www.ncbi.nlm.nih.gov/pubmed/27940952
http://dx.doi.org/10.1101/gr.213462.116
_version_ 1783232863784665088
author Edge, Peter
Bafna, Vineet
Bansal, Vikas
author_facet Edge, Peter
Bafna, Vineet
Bansal, Vikas
author_sort Edge, Peter
collection PubMed
description Many tools have been developed for haplotype assembly—the reconstruction of individual haplotypes using reads mapped to a reference genome sequence. Due to increasing interest in obtaining haplotype-resolved human genomes, a range of new sequencing protocols and technologies have been developed to enable the reconstruction of whole-genome haplotypes. However, existing computational methods designed to handle specific technologies do not scale well on data from different protocols. We describe a new algorithm, HapCUT2, that extends our previous method (HapCUT) to handle multiple sequencing technologies. Using simulations and whole-genome sequencing (WGS) data from multiple different data types—dilution pool sequencing, linked-read sequencing, single molecule real-time (SMRT) sequencing, and proximity ligation (Hi-C) sequencing—we show that HapCUT2 rapidly assembles haplotypes with best-in-class accuracy for all data types. In particular, HapCUT2 scales well for high sequencing coverage and rapidly assembled haplotypes for two long-read WGS data sets on which other methods struggled. Further, HapCUT2 directly models Hi-C specific error modalities, resulting in significant improvements in error rates compared to HapCUT, the only other method that could assemble haplotypes from Hi-C data. Using HapCUT2, haplotype assembly from a 90× coverage whole-genome Hi-C data set yielded high-resolution haplotypes (78.6% of variants phased in a single block) with high pairwise phasing accuracy (∼98% across chromosomes). Our results demonstrate that HapCUT2 is a robust tool for haplotype assembly applicable to data from diverse sequencing technologies.
format Online
Article
Text
id pubmed-5411775
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Cold Spring Harbor Laboratory Press
record_format MEDLINE/PubMed
spelling pubmed-54117752017-11-01 HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies Edge, Peter Bafna, Vineet Bansal, Vikas Genome Res Method Many tools have been developed for haplotype assembly—the reconstruction of individual haplotypes using reads mapped to a reference genome sequence. Due to increasing interest in obtaining haplotype-resolved human genomes, a range of new sequencing protocols and technologies have been developed to enable the reconstruction of whole-genome haplotypes. However, existing computational methods designed to handle specific technologies do not scale well on data from different protocols. We describe a new algorithm, HapCUT2, that extends our previous method (HapCUT) to handle multiple sequencing technologies. Using simulations and whole-genome sequencing (WGS) data from multiple different data types—dilution pool sequencing, linked-read sequencing, single molecule real-time (SMRT) sequencing, and proximity ligation (Hi-C) sequencing—we show that HapCUT2 rapidly assembles haplotypes with best-in-class accuracy for all data types. In particular, HapCUT2 scales well for high sequencing coverage and rapidly assembled haplotypes for two long-read WGS data sets on which other methods struggled. Further, HapCUT2 directly models Hi-C specific error modalities, resulting in significant improvements in error rates compared to HapCUT, the only other method that could assemble haplotypes from Hi-C data. Using HapCUT2, haplotype assembly from a 90× coverage whole-genome Hi-C data set yielded high-resolution haplotypes (78.6% of variants phased in a single block) with high pairwise phasing accuracy (∼98% across chromosomes). Our results demonstrate that HapCUT2 is a robust tool for haplotype assembly applicable to data from diverse sequencing technologies. Cold Spring Harbor Laboratory Press 2017-05 /pmc/articles/PMC5411775/ /pubmed/27940952 http://dx.doi.org/10.1101/gr.213462.116 Text en © 2017 Edge et al.; Published by Cold Spring Harbor Laboratory Press http://creativecommons.org/licenses/by-nc/4.0/ This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.
spellingShingle Method
Edge, Peter
Bafna, Vineet
Bansal, Vikas
HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies
title HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies
title_full HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies
title_fullStr HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies
title_full_unstemmed HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies
title_short HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies
title_sort hapcut2: robust and accurate haplotype assembly for diverse sequencing technologies
topic Method
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5411775/
https://www.ncbi.nlm.nih.gov/pubmed/27940952
http://dx.doi.org/10.1101/gr.213462.116
work_keys_str_mv AT edgepeter hapcut2robustandaccuratehaplotypeassemblyfordiversesequencingtechnologies
AT bafnavineet hapcut2robustandaccuratehaplotypeassemblyfordiversesequencingtechnologies
AT bansalvikas hapcut2robustandaccuratehaplotypeassemblyfordiversesequencingtechnologies