Cargando…

HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads

Complete and accurate genome assemblies form the basis of most downstream genomic analyses and are of critical importance. Recent genome assembly projects have relied on a combination of noisy long-read sequencing and accurate short-read sequencing, with the former offering greater assembly continui...

Descripción completa

Detalles Bibliográficos
Autores principales: Nurk, Sergey, Walenz, Brian P., Rhie, Arang, Vollger, Mitchell R., Logsdon, Glennis A., Grothe, Robert, Miga, Karen H., Eichler, Evan E., Phillippy, Adam M., Koren, Sergey
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7545148/
https://www.ncbi.nlm.nih.gov/pubmed/32801147
http://dx.doi.org/10.1101/gr.263566.120
_version_ 1783591974389940224
author Nurk, Sergey
Walenz, Brian P.
Rhie, Arang
Vollger, Mitchell R.
Logsdon, Glennis A.
Grothe, Robert
Miga, Karen H.
Eichler, Evan E.
Phillippy, Adam M.
Koren, Sergey
author_facet Nurk, Sergey
Walenz, Brian P.
Rhie, Arang
Vollger, Mitchell R.
Logsdon, Glennis A.
Grothe, Robert
Miga, Karen H.
Eichler, Evan E.
Phillippy, Adam M.
Koren, Sergey
author_sort Nurk, Sergey
collection PubMed
description Complete and accurate genome assemblies form the basis of most downstream genomic analyses and are of critical importance. Recent genome assembly projects have relied on a combination of noisy long-read sequencing and accurate short-read sequencing, with the former offering greater assembly continuity and the latter providing higher consensus accuracy. The recently introduced Pacific Biosciences (PacBio) HiFi sequencing technology bridges this divide by delivering long reads (>10 kbp) with high per-base accuracy (>99.9%). Here we present HiCanu, a modification of the Canu assembler designed to leverage the full potential of HiFi reads via homopolymer compression, overlap-based error correction, and aggressive false overlap filtering. We benchmark HiCanu with a focus on the recovery of haplotype diversity, major histocompatibility complex (MHC) variants, satellite DNAs, and segmental duplications. For diploid human genomes sequenced to 30× HiFi coverage, HiCanu achieved superior accuracy and allele recovery compared to the current state of the art. On the effectively haploid CHM13 human cell line, HiCanu achieved an NG50 contig size of 77 Mbp with a per-base consensus accuracy of 99.999% (QV50), surpassing recent assemblies of high-coverage, ultralong Oxford Nanopore Technologies (ONT) reads in terms of both accuracy and continuity. This HiCanu assembly correctly resolves 337 out of 341 validation BACs sampled from known segmental duplications and provides the first preliminary assemblies of nine complete human centromeric regions. Although gaps and errors still remain within the most challenging regions of the genome, these results represent a significant advance toward the complete assembly of human genomes.
format Online
Article
Text
id pubmed-7545148
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Cold Spring Harbor Laboratory Press
record_format MEDLINE/PubMed
spelling pubmed-75451482020-10-19 HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads Nurk, Sergey Walenz, Brian P. Rhie, Arang Vollger, Mitchell R. Logsdon, Glennis A. Grothe, Robert Miga, Karen H. Eichler, Evan E. Phillippy, Adam M. Koren, Sergey Genome Res Method Complete and accurate genome assemblies form the basis of most downstream genomic analyses and are of critical importance. Recent genome assembly projects have relied on a combination of noisy long-read sequencing and accurate short-read sequencing, with the former offering greater assembly continuity and the latter providing higher consensus accuracy. The recently introduced Pacific Biosciences (PacBio) HiFi sequencing technology bridges this divide by delivering long reads (>10 kbp) with high per-base accuracy (>99.9%). Here we present HiCanu, a modification of the Canu assembler designed to leverage the full potential of HiFi reads via homopolymer compression, overlap-based error correction, and aggressive false overlap filtering. We benchmark HiCanu with a focus on the recovery of haplotype diversity, major histocompatibility complex (MHC) variants, satellite DNAs, and segmental duplications. For diploid human genomes sequenced to 30× HiFi coverage, HiCanu achieved superior accuracy and allele recovery compared to the current state of the art. On the effectively haploid CHM13 human cell line, HiCanu achieved an NG50 contig size of 77 Mbp with a per-base consensus accuracy of 99.999% (QV50), surpassing recent assemblies of high-coverage, ultralong Oxford Nanopore Technologies (ONT) reads in terms of both accuracy and continuity. This HiCanu assembly correctly resolves 337 out of 341 validation BACs sampled from known segmental duplications and provides the first preliminary assemblies of nine complete human centromeric regions. Although gaps and errors still remain within the most challenging regions of the genome, these results represent a significant advance toward the complete assembly of human genomes. Cold Spring Harbor Laboratory Press 2020-09 /pmc/articles/PMC7545148/ /pubmed/32801147 http://dx.doi.org/10.1101/gr.263566.120 Text en © 2020 Nurk et al.; Published by Cold Spring Harbor Laboratory Press http://creativecommons.org/licenses/by/4.0/ This article, published in Genome Research, is available under a Creative Commons License (Attribution 4.0 International), as described at http://creativecommons.org/licenses/by/4.0/.
spellingShingle Method
Nurk, Sergey
Walenz, Brian P.
Rhie, Arang
Vollger, Mitchell R.
Logsdon, Glennis A.
Grothe, Robert
Miga, Karen H.
Eichler, Evan E.
Phillippy, Adam M.
Koren, Sergey
HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads
title HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads
title_full HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads
title_fullStr HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads
title_full_unstemmed HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads
title_short HiCanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads
title_sort hicanu: accurate assembly of segmental duplications, satellites, and allelic variants from high-fidelity long reads
topic Method
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7545148/
https://www.ncbi.nlm.nih.gov/pubmed/32801147
http://dx.doi.org/10.1101/gr.263566.120
work_keys_str_mv AT nurksergey hicanuaccurateassemblyofsegmentalduplicationssatellitesandallelicvariantsfromhighfidelitylongreads
AT walenzbrianp hicanuaccurateassemblyofsegmentalduplicationssatellitesandallelicvariantsfromhighfidelitylongreads
AT rhiearang hicanuaccurateassemblyofsegmentalduplicationssatellitesandallelicvariantsfromhighfidelitylongreads
AT vollgermitchellr hicanuaccurateassemblyofsegmentalduplicationssatellitesandallelicvariantsfromhighfidelitylongreads
AT logsdonglennisa hicanuaccurateassemblyofsegmentalduplicationssatellitesandallelicvariantsfromhighfidelitylongreads
AT grotherobert hicanuaccurateassemblyofsegmentalduplicationssatellitesandallelicvariantsfromhighfidelitylongreads
AT migakarenh hicanuaccurateassemblyofsegmentalduplicationssatellitesandallelicvariantsfromhighfidelitylongreads
AT eichlerevane hicanuaccurateassemblyofsegmentalduplicationssatellitesandallelicvariantsfromhighfidelitylongreads
AT phillippyadamm hicanuaccurateassemblyofsegmentalduplicationssatellitesandallelicvariantsfromhighfidelitylongreads
AT korensergey hicanuaccurateassemblyofsegmentalduplicationssatellitesandallelicvariantsfromhighfidelitylongreads