Cargando…
CSA: A high-throughput chromosome-scale assembly pipeline for vertebrate genomes
BACKGROUND: Easy-to-use and fast bioinformatics pipelines for long-read assembly that go beyond the contig level to generate highly continuous chromosome-scale genomes from raw data remain scarce. RESULT: Chromosome-Scale Assembler (CSA) is a novel computationally highly efficient bioinformatics pip...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7247394/ https://www.ncbi.nlm.nih.gov/pubmed/32449778 http://dx.doi.org/10.1093/gigascience/giaa034 |
_version_ | 1783538146572500992 |
---|---|
author | Kuhl, Heiner Li, Ling Wuertz, Sven Stöck, Matthias Liang, Xu-Fang Klopp, Christophe |
author_facet | Kuhl, Heiner Li, Ling Wuertz, Sven Stöck, Matthias Liang, Xu-Fang Klopp, Christophe |
author_sort | Kuhl, Heiner |
collection | PubMed |
description | BACKGROUND: Easy-to-use and fast bioinformatics pipelines for long-read assembly that go beyond the contig level to generate highly continuous chromosome-scale genomes from raw data remain scarce. RESULT: Chromosome-Scale Assembler (CSA) is a novel computationally highly efficient bioinformatics pipeline that fills this gap. CSA integrates information from scaffolded assemblies (e.g., Hi-C or 10X Genomics) or even from diverged reference genomes into the assembly process. As CSA performs automated assembly of chromosome-sized scaffolds, we benchmark its performance against state-of-the-art reference genomes, i.e., conventionally built in a laborious fashion using multiple separate assembly tools and manual curation. CSA increases the contig lengths using scaffolding, local re-assembly, and gap closing. On certain datasets, initial contig N50 may be increased up to 4.5-fold. For smaller vertebrate genomes, chromosome-scale assemblies can be achieved within 12 h using low-cost, high-end desktop computers. Mammalian genomes can be processed within 16 h on compute-servers. Using diverged reference genomes for fish, birds, and mammals, we demonstrate that CSA calculates chromosome-scale assemblies from long-read data and genome comparisons alone. Even contig-level draft assemblies of diverged genomes are helpful for reconstructing chromosome-scale sequences. CSA is also capable of assembling ultra-long reads. CONCLUSIONS: CSA can speed up and simplify chromosome-level assembly and significantly lower costs of large-scale family-level vertebrate genome projects. |
format | Online Article Text |
id | pubmed-7247394 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-72473942020-05-28 CSA: A high-throughput chromosome-scale assembly pipeline for vertebrate genomes Kuhl, Heiner Li, Ling Wuertz, Sven Stöck, Matthias Liang, Xu-Fang Klopp, Christophe Gigascience Technical Note BACKGROUND: Easy-to-use and fast bioinformatics pipelines for long-read assembly that go beyond the contig level to generate highly continuous chromosome-scale genomes from raw data remain scarce. RESULT: Chromosome-Scale Assembler (CSA) is a novel computationally highly efficient bioinformatics pipeline that fills this gap. CSA integrates information from scaffolded assemblies (e.g., Hi-C or 10X Genomics) or even from diverged reference genomes into the assembly process. As CSA performs automated assembly of chromosome-sized scaffolds, we benchmark its performance against state-of-the-art reference genomes, i.e., conventionally built in a laborious fashion using multiple separate assembly tools and manual curation. CSA increases the contig lengths using scaffolding, local re-assembly, and gap closing. On certain datasets, initial contig N50 may be increased up to 4.5-fold. For smaller vertebrate genomes, chromosome-scale assemblies can be achieved within 12 h using low-cost, high-end desktop computers. Mammalian genomes can be processed within 16 h on compute-servers. Using diverged reference genomes for fish, birds, and mammals, we demonstrate that CSA calculates chromosome-scale assemblies from long-read data and genome comparisons alone. Even contig-level draft assemblies of diverged genomes are helpful for reconstructing chromosome-scale sequences. CSA is also capable of assembling ultra-long reads. CONCLUSIONS: CSA can speed up and simplify chromosome-level assembly and significantly lower costs of large-scale family-level vertebrate genome projects. Oxford University Press 2020-05-25 /pmc/articles/PMC7247394/ /pubmed/32449778 http://dx.doi.org/10.1093/gigascience/giaa034 Text en © The Author(s) 2020. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Technical Note Kuhl, Heiner Li, Ling Wuertz, Sven Stöck, Matthias Liang, Xu-Fang Klopp, Christophe CSA: A high-throughput chromosome-scale assembly pipeline for vertebrate genomes |
title |
CSA: A high-throughput chromosome-scale assembly pipeline for vertebrate genomes |
title_full |
CSA: A high-throughput chromosome-scale assembly pipeline for vertebrate genomes |
title_fullStr |
CSA: A high-throughput chromosome-scale assembly pipeline for vertebrate genomes |
title_full_unstemmed |
CSA: A high-throughput chromosome-scale assembly pipeline for vertebrate genomes |
title_short |
CSA: A high-throughput chromosome-scale assembly pipeline for vertebrate genomes |
title_sort | csa: a high-throughput chromosome-scale assembly pipeline for vertebrate genomes |
topic | Technical Note |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7247394/ https://www.ncbi.nlm.nih.gov/pubmed/32449778 http://dx.doi.org/10.1093/gigascience/giaa034 |
work_keys_str_mv | AT kuhlheiner csaahighthroughputchromosomescaleassemblypipelineforvertebrategenomes AT liling csaahighthroughputchromosomescaleassemblypipelineforvertebrategenomes AT wuertzsven csaahighthroughputchromosomescaleassemblypipelineforvertebrategenomes AT stockmatthias csaahighthroughputchromosomescaleassemblypipelineforvertebrategenomes AT liangxufang csaahighthroughputchromosomescaleassemblypipelineforvertebrategenomes AT kloppchristophe csaahighthroughputchromosomescaleassemblypipelineforvertebrategenomes |