Cargando…

GRASShopPER—An algorithm for de novo assembly based on GPU alignments

Next generation sequencers produce billions of short DNA sequences in a massively parallel manner, which causes a great computational challenge in accurately reconstructing a genome sequence de novo using these short sequences. Here, we propose the GRASShopPER assembler, which follows an approach of...

Descripción completa

Detalles Bibliográficos
Autores principales: Swiercz, Aleksandra, Frohmberg, Wojciech, Kierzynka, Michal, Wojciechowski, Pawel, Zurkowski, Piotr, Badura, Jan, Laskowski, Artur, Kasprzak, Marta, Blazewicz, Jacek
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6095601/
https://www.ncbi.nlm.nih.gov/pubmed/30114279
http://dx.doi.org/10.1371/journal.pone.0202355
_version_ 1783347968994181120
author Swiercz, Aleksandra
Frohmberg, Wojciech
Kierzynka, Michal
Wojciechowski, Pawel
Zurkowski, Piotr
Badura, Jan
Laskowski, Artur
Kasprzak, Marta
Blazewicz, Jacek
author_facet Swiercz, Aleksandra
Frohmberg, Wojciech
Kierzynka, Michal
Wojciechowski, Pawel
Zurkowski, Piotr
Badura, Jan
Laskowski, Artur
Kasprzak, Marta
Blazewicz, Jacek
author_sort Swiercz, Aleksandra
collection PubMed
description Next generation sequencers produce billions of short DNA sequences in a massively parallel manner, which causes a great computational challenge in accurately reconstructing a genome sequence de novo using these short sequences. Here, we propose the GRASShopPER assembler, which follows an approach of overlap-layout-consensus. It uses an efficient GPU implementation for the sequence alignment during the graph construction stage and a greedy hyper-heuristic algorithm at the fork detection stage. A two-part fork detection method allows us to identify repeated fragments of a genome and to reconstruct them without misassemblies. The assemblies of data sets of bacteria Candidatus Microthrix, nematode Caenorhabditis elegans, and human chromosome 14 were evaluated with the golden standard tool QUAST. In comparison with other assemblers, GRASShopPER provided contigs that covered the largest part of the genomes and, at the same time, kept good values of other metrics, e.g., NG50 and misassembly rate.
format Online
Article
Text
id pubmed-6095601
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-60956012018-08-30 GRASShopPER—An algorithm for de novo assembly based on GPU alignments Swiercz, Aleksandra Frohmberg, Wojciech Kierzynka, Michal Wojciechowski, Pawel Zurkowski, Piotr Badura, Jan Laskowski, Artur Kasprzak, Marta Blazewicz, Jacek PLoS One Research Article Next generation sequencers produce billions of short DNA sequences in a massively parallel manner, which causes a great computational challenge in accurately reconstructing a genome sequence de novo using these short sequences. Here, we propose the GRASShopPER assembler, which follows an approach of overlap-layout-consensus. It uses an efficient GPU implementation for the sequence alignment during the graph construction stage and a greedy hyper-heuristic algorithm at the fork detection stage. A two-part fork detection method allows us to identify repeated fragments of a genome and to reconstruct them without misassemblies. The assemblies of data sets of bacteria Candidatus Microthrix, nematode Caenorhabditis elegans, and human chromosome 14 were evaluated with the golden standard tool QUAST. In comparison with other assemblers, GRASShopPER provided contigs that covered the largest part of the genomes and, at the same time, kept good values of other metrics, e.g., NG50 and misassembly rate. Public Library of Science 2018-08-16 /pmc/articles/PMC6095601/ /pubmed/30114279 http://dx.doi.org/10.1371/journal.pone.0202355 Text en © 2018 Swiercz et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Swiercz, Aleksandra
Frohmberg, Wojciech
Kierzynka, Michal
Wojciechowski, Pawel
Zurkowski, Piotr
Badura, Jan
Laskowski, Artur
Kasprzak, Marta
Blazewicz, Jacek
GRASShopPER—An algorithm for de novo assembly based on GPU alignments
title GRASShopPER—An algorithm for de novo assembly based on GPU alignments
title_full GRASShopPER—An algorithm for de novo assembly based on GPU alignments
title_fullStr GRASShopPER—An algorithm for de novo assembly based on GPU alignments
title_full_unstemmed GRASShopPER—An algorithm for de novo assembly based on GPU alignments
title_short GRASShopPER—An algorithm for de novo assembly based on GPU alignments
title_sort grasshopper—an algorithm for de novo assembly based on gpu alignments
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6095601/
https://www.ncbi.nlm.nih.gov/pubmed/30114279
http://dx.doi.org/10.1371/journal.pone.0202355
work_keys_str_mv AT swierczaleksandra grasshopperanalgorithmfordenovoassemblybasedongpualignments
AT frohmbergwojciech grasshopperanalgorithmfordenovoassemblybasedongpualignments
AT kierzynkamichal grasshopperanalgorithmfordenovoassemblybasedongpualignments
AT wojciechowskipawel grasshopperanalgorithmfordenovoassemblybasedongpualignments
AT zurkowskipiotr grasshopperanalgorithmfordenovoassemblybasedongpualignments
AT badurajan grasshopperanalgorithmfordenovoassemblybasedongpualignments
AT laskowskiartur grasshopperanalgorithmfordenovoassemblybasedongpualignments
AT kasprzakmarta grasshopperanalgorithmfordenovoassemblybasedongpualignments
AT blazewiczjacek grasshopperanalgorithmfordenovoassemblybasedongpualignments