Cargando…

Overcoming uncollapsed haplotypes in long-read assemblies of non-model organisms

BACKGROUND: Long-read sequencing is revolutionizing genome assembly: as PacBio and Nanopore technologies become more accessible in technicity and in cost, long-read assemblers flourish and are starting to deliver chromosome-level assemblies. However, these long reads are usually error-prone, making...

Descripción completa

Detalles Bibliográficos
Autores principales: Guiglielmoni, Nadège, Houtain, Antoine, Derzelle, Alessandro, Van Doninck, Karine, Flot, Jean-François
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8178825/
https://www.ncbi.nlm.nih.gov/pubmed/34090340
http://dx.doi.org/10.1186/s12859-021-04118-3
_version_ 1783703654738427904
author Guiglielmoni, Nadège
Houtain, Antoine
Derzelle, Alessandro
Van Doninck, Karine
Flot, Jean-François
author_facet Guiglielmoni, Nadège
Houtain, Antoine
Derzelle, Alessandro
Van Doninck, Karine
Flot, Jean-François
author_sort Guiglielmoni, Nadège
collection PubMed
description BACKGROUND: Long-read sequencing is revolutionizing genome assembly: as PacBio and Nanopore technologies become more accessible in technicity and in cost, long-read assemblers flourish and are starting to deliver chromosome-level assemblies. However, these long reads are usually error-prone, making the generation of a haploid reference out of a diploid genome a difficult enterprise. Failure to properly collapse haplotypes results in fragmented and structurally incorrect assemblies and wreaks havoc on orthology inference pipelines, yet this serious issue is rarely acknowledged and dealt with in genomic projects, and an independent, comparative benchmark of the capacity of assemblers and post-processing tools to properly collapse or purge haplotypes is still lacking. RESULTS: We tested different assembly strategies on the genome of the rotifer Adineta vaga, a non-model organism for which high coverages of both PacBio and Nanopore reads were available. The assemblers we tested (Canu, Flye, NextDenovo, Ra, Raven, Shasta and wtdbg2) exhibited strikingly different behaviors when dealing with highly heterozygous regions, resulting in variable amounts of uncollapsed haplotypes. Filtering reads generally improved haploid assemblies, and we also benchmarked three post-processing tools aimed at detecting and purging uncollapsed haplotypes in long-read assemblies: HaploMerger2, purge_haplotigs and purge_dups. CONCLUSIONS: We provide a thorough evaluation of popular assemblers on a non-model eukaryote genome with variable levels of heterozygosity. Our study highlights several strategies using pre and post-processing approaches to generate haploid assemblies with high continuity and completeness. This benchmark will help users to improve haploid assemblies of non-model organisms, and evaluate the quality of their own assemblies. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04118-3.
format Online
Article
Text
id pubmed-8178825
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-81788252021-06-07 Overcoming uncollapsed haplotypes in long-read assemblies of non-model organisms Guiglielmoni, Nadège Houtain, Antoine Derzelle, Alessandro Van Doninck, Karine Flot, Jean-François BMC Bioinformatics Research Article BACKGROUND: Long-read sequencing is revolutionizing genome assembly: as PacBio and Nanopore technologies become more accessible in technicity and in cost, long-read assemblers flourish and are starting to deliver chromosome-level assemblies. However, these long reads are usually error-prone, making the generation of a haploid reference out of a diploid genome a difficult enterprise. Failure to properly collapse haplotypes results in fragmented and structurally incorrect assemblies and wreaks havoc on orthology inference pipelines, yet this serious issue is rarely acknowledged and dealt with in genomic projects, and an independent, comparative benchmark of the capacity of assemblers and post-processing tools to properly collapse or purge haplotypes is still lacking. RESULTS: We tested different assembly strategies on the genome of the rotifer Adineta vaga, a non-model organism for which high coverages of both PacBio and Nanopore reads were available. The assemblers we tested (Canu, Flye, NextDenovo, Ra, Raven, Shasta and wtdbg2) exhibited strikingly different behaviors when dealing with highly heterozygous regions, resulting in variable amounts of uncollapsed haplotypes. Filtering reads generally improved haploid assemblies, and we also benchmarked three post-processing tools aimed at detecting and purging uncollapsed haplotypes in long-read assemblies: HaploMerger2, purge_haplotigs and purge_dups. CONCLUSIONS: We provide a thorough evaluation of popular assemblers on a non-model eukaryote genome with variable levels of heterozygosity. Our study highlights several strategies using pre and post-processing approaches to generate haploid assemblies with high continuity and completeness. This benchmark will help users to improve haploid assemblies of non-model organisms, and evaluate the quality of their own assemblies. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04118-3. BioMed Central 2021-06-05 /pmc/articles/PMC8178825/ /pubmed/34090340 http://dx.doi.org/10.1186/s12859-021-04118-3 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Article
Guiglielmoni, Nadège
Houtain, Antoine
Derzelle, Alessandro
Van Doninck, Karine
Flot, Jean-François
Overcoming uncollapsed haplotypes in long-read assemblies of non-model organisms
title Overcoming uncollapsed haplotypes in long-read assemblies of non-model organisms
title_full Overcoming uncollapsed haplotypes in long-read assemblies of non-model organisms
title_fullStr Overcoming uncollapsed haplotypes in long-read assemblies of non-model organisms
title_full_unstemmed Overcoming uncollapsed haplotypes in long-read assemblies of non-model organisms
title_short Overcoming uncollapsed haplotypes in long-read assemblies of non-model organisms
title_sort overcoming uncollapsed haplotypes in long-read assemblies of non-model organisms
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8178825/
https://www.ncbi.nlm.nih.gov/pubmed/34090340
http://dx.doi.org/10.1186/s12859-021-04118-3
work_keys_str_mv AT guiglielmoninadege overcominguncollapsedhaplotypesinlongreadassembliesofnonmodelorganisms
AT houtainantoine overcominguncollapsedhaplotypesinlongreadassembliesofnonmodelorganisms
AT derzellealessandro overcominguncollapsedhaplotypesinlongreadassembliesofnonmodelorganisms
AT vandoninckkarine overcominguncollapsedhaplotypesinlongreadassembliesofnonmodelorganisms
AT flotjeanfrancois overcominguncollapsedhaplotypesinlongreadassembliesofnonmodelorganisms