Cargando…

Fast-SG: an alignment-free algorithm for hybrid assembly

BACKGROUND: Long-read sequencing technologies are the ultimate solution for genome repeats, allowing near reference-level reconstructions of large genomes. However, long-read de novo assembly pipelines are computationally intense and require a considerable amount of coverage, thereby hindering their...

Descripción completa

Detalles Bibliográficos
Autores principales: Di Genova, Alex, Ruz, Gonzalo A, Sagot, Marie-France, Maass, Alejandro
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6007556/
https://www.ncbi.nlm.nih.gov/pubmed/29741627
http://dx.doi.org/10.1093/gigascience/giy048
_version_ 1783333061461540864
author Di Genova, Alex
Ruz, Gonzalo A
Sagot, Marie-France
Maass, Alejandro
author_facet Di Genova, Alex
Ruz, Gonzalo A
Sagot, Marie-France
Maass, Alejandro
author_sort Di Genova, Alex
collection PubMed
description BACKGROUND: Long-read sequencing technologies are the ultimate solution for genome repeats, allowing near reference-level reconstructions of large genomes. However, long-read de novo assembly pipelines are computationally intense and require a considerable amount of coverage, thereby hindering their broad application to the assembly of large genomes. Alternatively, hybrid assembly methods that combine short- and long-read sequencing technologies can reduce the time and cost required to produce de novo assemblies of large genomes. RESULTS: Here, we propose a new method, called Fast-SG, that uses a new ultrafast alignment-free algorithm specifically designed for constructing a scaffolding graph using light-weight data structures. Fast-SG can construct the graph from either short or long reads. This allows the reuse of efficient algorithms designed for short-read data and permits the definition of novel modular hybrid assembly pipelines. Using comprehensive standard datasets and benchmarks, we show how Fast-SG outperforms the state-of-the-art short-read aligners when building the scaffoldinggraph and can be used to extract linking information from either raw or error-corrected long reads. We also show how a hybrid assembly approach using Fast-SG with shallow long-read coverage (5X) and moderate computational resources can produce long-range and accurate reconstructions of the genomes of Arabidopsis thaliana (Ler-0) and human (NA12878). CONCLUSIONS: Fast-SG opens a door to achieve accurate hybrid long-range reconstructions of large genomes with low effort, high portability, and low cost.
format Online
Article
Text
id pubmed-6007556
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-60075562018-07-05 Fast-SG: an alignment-free algorithm for hybrid assembly Di Genova, Alex Ruz, Gonzalo A Sagot, Marie-France Maass, Alejandro Gigascience Technical Note BACKGROUND: Long-read sequencing technologies are the ultimate solution for genome repeats, allowing near reference-level reconstructions of large genomes. However, long-read de novo assembly pipelines are computationally intense and require a considerable amount of coverage, thereby hindering their broad application to the assembly of large genomes. Alternatively, hybrid assembly methods that combine short- and long-read sequencing technologies can reduce the time and cost required to produce de novo assemblies of large genomes. RESULTS: Here, we propose a new method, called Fast-SG, that uses a new ultrafast alignment-free algorithm specifically designed for constructing a scaffolding graph using light-weight data structures. Fast-SG can construct the graph from either short or long reads. This allows the reuse of efficient algorithms designed for short-read data and permits the definition of novel modular hybrid assembly pipelines. Using comprehensive standard datasets and benchmarks, we show how Fast-SG outperforms the state-of-the-art short-read aligners when building the scaffoldinggraph and can be used to extract linking information from either raw or error-corrected long reads. We also show how a hybrid assembly approach using Fast-SG with shallow long-read coverage (5X) and moderate computational resources can produce long-range and accurate reconstructions of the genomes of Arabidopsis thaliana (Ler-0) and human (NA12878). CONCLUSIONS: Fast-SG opens a door to achieve accurate hybrid long-range reconstructions of large genomes with low effort, high portability, and low cost. Oxford University Press 2018-05-05 /pmc/articles/PMC6007556/ /pubmed/29741627 http://dx.doi.org/10.1093/gigascience/giy048 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Technical Note
Di Genova, Alex
Ruz, Gonzalo A
Sagot, Marie-France
Maass, Alejandro
Fast-SG: an alignment-free algorithm for hybrid assembly
title Fast-SG: an alignment-free algorithm for hybrid assembly
title_full Fast-SG: an alignment-free algorithm for hybrid assembly
title_fullStr Fast-SG: an alignment-free algorithm for hybrid assembly
title_full_unstemmed Fast-SG: an alignment-free algorithm for hybrid assembly
title_short Fast-SG: an alignment-free algorithm for hybrid assembly
title_sort fast-sg: an alignment-free algorithm for hybrid assembly
topic Technical Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6007556/
https://www.ncbi.nlm.nih.gov/pubmed/29741627
http://dx.doi.org/10.1093/gigascience/giy048
work_keys_str_mv AT digenovaalex fastsganalignmentfreealgorithmforhybridassembly
AT ruzgonzaloa fastsganalignmentfreealgorithmforhybridassembly
AT sagotmariefrance fastsganalignmentfreealgorithmforhybridassembly
AT maassalejandro fastsganalignmentfreealgorithmforhybridassembly