Cargando…
Fast-SG: an alignment-free algorithm for hybrid assembly
BACKGROUND: Long-read sequencing technologies are the ultimate solution for genome repeats, allowing near reference-level reconstructions of large genomes. However, long-read de novo assembly pipelines are computationally intense and require a considerable amount of coverage, thereby hindering their...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6007556/ https://www.ncbi.nlm.nih.gov/pubmed/29741627 http://dx.doi.org/10.1093/gigascience/giy048 |
_version_ | 1783333061461540864 |
---|---|
author | Di Genova, Alex Ruz, Gonzalo A Sagot, Marie-France Maass, Alejandro |
author_facet | Di Genova, Alex Ruz, Gonzalo A Sagot, Marie-France Maass, Alejandro |
author_sort | Di Genova, Alex |
collection | PubMed |
description | BACKGROUND: Long-read sequencing technologies are the ultimate solution for genome repeats, allowing near reference-level reconstructions of large genomes. However, long-read de novo assembly pipelines are computationally intense and require a considerable amount of coverage, thereby hindering their broad application to the assembly of large genomes. Alternatively, hybrid assembly methods that combine short- and long-read sequencing technologies can reduce the time and cost required to produce de novo assemblies of large genomes. RESULTS: Here, we propose a new method, called Fast-SG, that uses a new ultrafast alignment-free algorithm specifically designed for constructing a scaffolding graph using light-weight data structures. Fast-SG can construct the graph from either short or long reads. This allows the reuse of efficient algorithms designed for short-read data and permits the definition of novel modular hybrid assembly pipelines. Using comprehensive standard datasets and benchmarks, we show how Fast-SG outperforms the state-of-the-art short-read aligners when building the scaffoldinggraph and can be used to extract linking information from either raw or error-corrected long reads. We also show how a hybrid assembly approach using Fast-SG with shallow long-read coverage (5X) and moderate computational resources can produce long-range and accurate reconstructions of the genomes of Arabidopsis thaliana (Ler-0) and human (NA12878). CONCLUSIONS: Fast-SG opens a door to achieve accurate hybrid long-range reconstructions of large genomes with low effort, high portability, and low cost. |
format | Online Article Text |
id | pubmed-6007556 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-60075562018-07-05 Fast-SG: an alignment-free algorithm for hybrid assembly Di Genova, Alex Ruz, Gonzalo A Sagot, Marie-France Maass, Alejandro Gigascience Technical Note BACKGROUND: Long-read sequencing technologies are the ultimate solution for genome repeats, allowing near reference-level reconstructions of large genomes. However, long-read de novo assembly pipelines are computationally intense and require a considerable amount of coverage, thereby hindering their broad application to the assembly of large genomes. Alternatively, hybrid assembly methods that combine short- and long-read sequencing technologies can reduce the time and cost required to produce de novo assemblies of large genomes. RESULTS: Here, we propose a new method, called Fast-SG, that uses a new ultrafast alignment-free algorithm specifically designed for constructing a scaffolding graph using light-weight data structures. Fast-SG can construct the graph from either short or long reads. This allows the reuse of efficient algorithms designed for short-read data and permits the definition of novel modular hybrid assembly pipelines. Using comprehensive standard datasets and benchmarks, we show how Fast-SG outperforms the state-of-the-art short-read aligners when building the scaffoldinggraph and can be used to extract linking information from either raw or error-corrected long reads. We also show how a hybrid assembly approach using Fast-SG with shallow long-read coverage (5X) and moderate computational resources can produce long-range and accurate reconstructions of the genomes of Arabidopsis thaliana (Ler-0) and human (NA12878). CONCLUSIONS: Fast-SG opens a door to achieve accurate hybrid long-range reconstructions of large genomes with low effort, high portability, and low cost. Oxford University Press 2018-05-05 /pmc/articles/PMC6007556/ /pubmed/29741627 http://dx.doi.org/10.1093/gigascience/giy048 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Technical Note Di Genova, Alex Ruz, Gonzalo A Sagot, Marie-France Maass, Alejandro Fast-SG: an alignment-free algorithm for hybrid assembly |
title | Fast-SG: an alignment-free algorithm for hybrid assembly |
title_full | Fast-SG: an alignment-free algorithm for hybrid assembly |
title_fullStr | Fast-SG: an alignment-free algorithm for hybrid assembly |
title_full_unstemmed | Fast-SG: an alignment-free algorithm for hybrid assembly |
title_short | Fast-SG: an alignment-free algorithm for hybrid assembly |
title_sort | fast-sg: an alignment-free algorithm for hybrid assembly |
topic | Technical Note |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6007556/ https://www.ncbi.nlm.nih.gov/pubmed/29741627 http://dx.doi.org/10.1093/gigascience/giy048 |
work_keys_str_mv | AT digenovaalex fastsganalignmentfreealgorithmforhybridassembly AT ruzgonzaloa fastsganalignmentfreealgorithmforhybridassembly AT sagotmariefrance fastsganalignmentfreealgorithmforhybridassembly AT maassalejandro fastsganalignmentfreealgorithmforhybridassembly |