Cargando…
HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies
Many tools have been developed for haplotype assembly—the reconstruction of individual haplotypes using reads mapped to a reference genome sequence. Due to increasing interest in obtaining haplotype-resolved human genomes, a range of new sequencing protocols and technologies have been developed to e...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cold Spring Harbor Laboratory Press
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5411775/ https://www.ncbi.nlm.nih.gov/pubmed/27940952 http://dx.doi.org/10.1101/gr.213462.116 |
_version_ | 1783232863784665088 |
---|---|
author | Edge, Peter Bafna, Vineet Bansal, Vikas |
author_facet | Edge, Peter Bafna, Vineet Bansal, Vikas |
author_sort | Edge, Peter |
collection | PubMed |
description | Many tools have been developed for haplotype assembly—the reconstruction of individual haplotypes using reads mapped to a reference genome sequence. Due to increasing interest in obtaining haplotype-resolved human genomes, a range of new sequencing protocols and technologies have been developed to enable the reconstruction of whole-genome haplotypes. However, existing computational methods designed to handle specific technologies do not scale well on data from different protocols. We describe a new algorithm, HapCUT2, that extends our previous method (HapCUT) to handle multiple sequencing technologies. Using simulations and whole-genome sequencing (WGS) data from multiple different data types—dilution pool sequencing, linked-read sequencing, single molecule real-time (SMRT) sequencing, and proximity ligation (Hi-C) sequencing—we show that HapCUT2 rapidly assembles haplotypes with best-in-class accuracy for all data types. In particular, HapCUT2 scales well for high sequencing coverage and rapidly assembled haplotypes for two long-read WGS data sets on which other methods struggled. Further, HapCUT2 directly models Hi-C specific error modalities, resulting in significant improvements in error rates compared to HapCUT, the only other method that could assemble haplotypes from Hi-C data. Using HapCUT2, haplotype assembly from a 90× coverage whole-genome Hi-C data set yielded high-resolution haplotypes (78.6% of variants phased in a single block) with high pairwise phasing accuracy (∼98% across chromosomes). Our results demonstrate that HapCUT2 is a robust tool for haplotype assembly applicable to data from diverse sequencing technologies. |
format | Online Article Text |
id | pubmed-5411775 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Cold Spring Harbor Laboratory Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-54117752017-11-01 HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies Edge, Peter Bafna, Vineet Bansal, Vikas Genome Res Method Many tools have been developed for haplotype assembly—the reconstruction of individual haplotypes using reads mapped to a reference genome sequence. Due to increasing interest in obtaining haplotype-resolved human genomes, a range of new sequencing protocols and technologies have been developed to enable the reconstruction of whole-genome haplotypes. However, existing computational methods designed to handle specific technologies do not scale well on data from different protocols. We describe a new algorithm, HapCUT2, that extends our previous method (HapCUT) to handle multiple sequencing technologies. Using simulations and whole-genome sequencing (WGS) data from multiple different data types—dilution pool sequencing, linked-read sequencing, single molecule real-time (SMRT) sequencing, and proximity ligation (Hi-C) sequencing—we show that HapCUT2 rapidly assembles haplotypes with best-in-class accuracy for all data types. In particular, HapCUT2 scales well for high sequencing coverage and rapidly assembled haplotypes for two long-read WGS data sets on which other methods struggled. Further, HapCUT2 directly models Hi-C specific error modalities, resulting in significant improvements in error rates compared to HapCUT, the only other method that could assemble haplotypes from Hi-C data. Using HapCUT2, haplotype assembly from a 90× coverage whole-genome Hi-C data set yielded high-resolution haplotypes (78.6% of variants phased in a single block) with high pairwise phasing accuracy (∼98% across chromosomes). Our results demonstrate that HapCUT2 is a robust tool for haplotype assembly applicable to data from diverse sequencing technologies. Cold Spring Harbor Laboratory Press 2017-05 /pmc/articles/PMC5411775/ /pubmed/27940952 http://dx.doi.org/10.1101/gr.213462.116 Text en © 2017 Edge et al.; Published by Cold Spring Harbor Laboratory Press http://creativecommons.org/licenses/by-nc/4.0/ This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/. |
spellingShingle | Method Edge, Peter Bafna, Vineet Bansal, Vikas HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies |
title | HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies |
title_full | HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies |
title_fullStr | HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies |
title_full_unstemmed | HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies |
title_short | HapCUT2: robust and accurate haplotype assembly for diverse sequencing technologies |
title_sort | hapcut2: robust and accurate haplotype assembly for diverse sequencing technologies |
topic | Method |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5411775/ https://www.ncbi.nlm.nih.gov/pubmed/27940952 http://dx.doi.org/10.1101/gr.213462.116 |
work_keys_str_mv | AT edgepeter hapcut2robustandaccuratehaplotypeassemblyfordiversesequencingtechnologies AT bafnavineet hapcut2robustandaccuratehaplotypeassemblyfordiversesequencingtechnologies AT bansalvikas hapcut2robustandaccuratehaplotypeassemblyfordiversesequencingtechnologies |