Cargando…

CSA: A high-throughput chromosome-scale assembly pipeline for vertebrate genomes

BACKGROUND: Easy-to-use and fast bioinformatics pipelines for long-read assembly that go beyond the contig level to generate highly continuous chromosome-scale genomes from raw data remain scarce. RESULT: Chromosome-Scale Assembler (CSA) is a novel computationally highly efficient bioinformatics pip...

Descripción completa

Detalles Bibliográficos
Autores principales: Kuhl, Heiner, Li, Ling, Wuertz, Sven, Stöck, Matthias, Liang, Xu-Fang, Klopp, Christophe
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7247394/
https://www.ncbi.nlm.nih.gov/pubmed/32449778
http://dx.doi.org/10.1093/gigascience/giaa034
_version_ 1783538146572500992
author Kuhl, Heiner
Li, Ling
Wuertz, Sven
Stöck, Matthias
Liang, Xu-Fang
Klopp, Christophe
author_facet Kuhl, Heiner
Li, Ling
Wuertz, Sven
Stöck, Matthias
Liang, Xu-Fang
Klopp, Christophe
author_sort Kuhl, Heiner
collection PubMed
description BACKGROUND: Easy-to-use and fast bioinformatics pipelines for long-read assembly that go beyond the contig level to generate highly continuous chromosome-scale genomes from raw data remain scarce. RESULT: Chromosome-Scale Assembler (CSA) is a novel computationally highly efficient bioinformatics pipeline that fills this gap. CSA integrates information from scaffolded assemblies (e.g., Hi-C or 10X Genomics) or even from diverged reference genomes into the assembly process. As CSA performs automated assembly of chromosome-sized scaffolds, we benchmark its performance against state-of-the-art reference genomes, i.e., conventionally built in a laborious fashion using multiple separate assembly tools and manual curation. CSA increases the contig lengths using scaffolding, local re-assembly, and gap closing. On certain datasets, initial contig N50 may be increased up to 4.5-fold. For smaller vertebrate genomes, chromosome-scale assemblies can be achieved within 12 h using low-cost, high-end desktop computers. Mammalian genomes can be processed within 16 h on compute-servers. Using diverged reference genomes for fish, birds, and mammals, we demonstrate that CSA calculates chromosome-scale assemblies from long-read data and genome comparisons alone. Even contig-level draft assemblies of diverged genomes are helpful for reconstructing chromosome-scale sequences. CSA is also capable of assembling ultra-long reads. CONCLUSIONS: CSA can speed up and simplify chromosome-level assembly and significantly lower costs of large-scale family-level vertebrate genome projects.
format Online
Article
Text
id pubmed-7247394
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-72473942020-05-28 CSA: A high-throughput chromosome-scale assembly pipeline for vertebrate genomes Kuhl, Heiner Li, Ling Wuertz, Sven Stöck, Matthias Liang, Xu-Fang Klopp, Christophe Gigascience Technical Note BACKGROUND: Easy-to-use and fast bioinformatics pipelines for long-read assembly that go beyond the contig level to generate highly continuous chromosome-scale genomes from raw data remain scarce. RESULT: Chromosome-Scale Assembler (CSA) is a novel computationally highly efficient bioinformatics pipeline that fills this gap. CSA integrates information from scaffolded assemblies (e.g., Hi-C or 10X Genomics) or even from diverged reference genomes into the assembly process. As CSA performs automated assembly of chromosome-sized scaffolds, we benchmark its performance against state-of-the-art reference genomes, i.e., conventionally built in a laborious fashion using multiple separate assembly tools and manual curation. CSA increases the contig lengths using scaffolding, local re-assembly, and gap closing. On certain datasets, initial contig N50 may be increased up to 4.5-fold. For smaller vertebrate genomes, chromosome-scale assemblies can be achieved within 12 h using low-cost, high-end desktop computers. Mammalian genomes can be processed within 16 h on compute-servers. Using diverged reference genomes for fish, birds, and mammals, we demonstrate that CSA calculates chromosome-scale assemblies from long-read data and genome comparisons alone. Even contig-level draft assemblies of diverged genomes are helpful for reconstructing chromosome-scale sequences. CSA is also capable of assembling ultra-long reads. CONCLUSIONS: CSA can speed up and simplify chromosome-level assembly and significantly lower costs of large-scale family-level vertebrate genome projects. Oxford University Press 2020-05-25 /pmc/articles/PMC7247394/ /pubmed/32449778 http://dx.doi.org/10.1093/gigascience/giaa034 Text en © The Author(s) 2020. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Technical Note
Kuhl, Heiner
Li, Ling
Wuertz, Sven
Stöck, Matthias
Liang, Xu-Fang
Klopp, Christophe
CSA: A high-throughput chromosome-scale assembly pipeline for vertebrate genomes
title CSA: A high-throughput chromosome-scale assembly pipeline for vertebrate genomes
title_full CSA: A high-throughput chromosome-scale assembly pipeline for vertebrate genomes
title_fullStr CSA: A high-throughput chromosome-scale assembly pipeline for vertebrate genomes
title_full_unstemmed CSA: A high-throughput chromosome-scale assembly pipeline for vertebrate genomes
title_short CSA: A high-throughput chromosome-scale assembly pipeline for vertebrate genomes
title_sort csa: a high-throughput chromosome-scale assembly pipeline for vertebrate genomes
topic Technical Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7247394/
https://www.ncbi.nlm.nih.gov/pubmed/32449778
http://dx.doi.org/10.1093/gigascience/giaa034
work_keys_str_mv AT kuhlheiner csaahighthroughputchromosomescaleassemblypipelineforvertebrategenomes
AT liling csaahighthroughputchromosomescaleassemblypipelineforvertebrategenomes
AT wuertzsven csaahighthroughputchromosomescaleassemblypipelineforvertebrategenomes
AT stockmatthias csaahighthroughputchromosomescaleassemblypipelineforvertebrategenomes
AT liangxufang csaahighthroughputchromosomescaleassemblypipelineforvertebrategenomes
AT kloppchristophe csaahighthroughputchromosomescaleassemblypipelineforvertebrategenomes