Cargando…

Next-Generation Sequence Assembly: Four Stages of Data Processing and Computational Challenges

Decoding DNA symbols using next-generation sequencers was a major breakthrough in genomic research. Despite the many advantages of next-generation sequencers, e.g., the high-throughput sequencing rate and relatively low cost of sequencing, the assembly of the reads produced by these sequencers still...

Descripción completa

Detalles Bibliográficos
Autores principales: El-Metwally, Sara, Hamza, Taher, Zakaria, Magdi, Helmy, Mohamed
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3861042/
https://www.ncbi.nlm.nih.gov/pubmed/24348224
http://dx.doi.org/10.1371/journal.pcbi.1003345
_version_ 1782295590066978816
author El-Metwally, Sara
Hamza, Taher
Zakaria, Magdi
Helmy, Mohamed
author_facet El-Metwally, Sara
Hamza, Taher
Zakaria, Magdi
Helmy, Mohamed
author_sort El-Metwally, Sara
collection PubMed
description Decoding DNA symbols using next-generation sequencers was a major breakthrough in genomic research. Despite the many advantages of next-generation sequencers, e.g., the high-throughput sequencing rate and relatively low cost of sequencing, the assembly of the reads produced by these sequencers still remains a major challenge. In this review, we address the basic framework of next-generation genome sequence assemblers, which comprises four basic stages: preprocessing filtering, a graph construction process, a graph simplification process, and postprocessing filtering. Here we discuss them as a framework of four stages for data analysis and processing and survey variety of techniques, algorithms, and software tools used during each stage. We also discuss the challenges that face current assemblers in the next-generation environment to determine the current state-of-the-art. We recommend a layered architecture approach for constructing a general assembler that can handle the sequences generated by different sequencing platforms.
format Online
Article
Text
id pubmed-3861042
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-38610422013-12-17 Next-Generation Sequence Assembly: Four Stages of Data Processing and Computational Challenges El-Metwally, Sara Hamza, Taher Zakaria, Magdi Helmy, Mohamed PLoS Comput Biol Review Decoding DNA symbols using next-generation sequencers was a major breakthrough in genomic research. Despite the many advantages of next-generation sequencers, e.g., the high-throughput sequencing rate and relatively low cost of sequencing, the assembly of the reads produced by these sequencers still remains a major challenge. In this review, we address the basic framework of next-generation genome sequence assemblers, which comprises four basic stages: preprocessing filtering, a graph construction process, a graph simplification process, and postprocessing filtering. Here we discuss them as a framework of four stages for data analysis and processing and survey variety of techniques, algorithms, and software tools used during each stage. We also discuss the challenges that face current assemblers in the next-generation environment to determine the current state-of-the-art. We recommend a layered architecture approach for constructing a general assembler that can handle the sequences generated by different sequencing platforms. Public Library of Science 2013-12-12 /pmc/articles/PMC3861042/ /pubmed/24348224 http://dx.doi.org/10.1371/journal.pcbi.1003345 Text en © 2013 El-Metwally et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Review
El-Metwally, Sara
Hamza, Taher
Zakaria, Magdi
Helmy, Mohamed
Next-Generation Sequence Assembly: Four Stages of Data Processing and Computational Challenges
title Next-Generation Sequence Assembly: Four Stages of Data Processing and Computational Challenges
title_full Next-Generation Sequence Assembly: Four Stages of Data Processing and Computational Challenges
title_fullStr Next-Generation Sequence Assembly: Four Stages of Data Processing and Computational Challenges
title_full_unstemmed Next-Generation Sequence Assembly: Four Stages of Data Processing and Computational Challenges
title_short Next-Generation Sequence Assembly: Four Stages of Data Processing and Computational Challenges
title_sort next-generation sequence assembly: four stages of data processing and computational challenges
topic Review
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3861042/
https://www.ncbi.nlm.nih.gov/pubmed/24348224
http://dx.doi.org/10.1371/journal.pcbi.1003345
work_keys_str_mv AT elmetwallysara nextgenerationsequenceassemblyfourstagesofdataprocessingandcomputationalchallenges
AT hamzataher nextgenerationsequenceassemblyfourstagesofdataprocessingandcomputationalchallenges
AT zakariamagdi nextgenerationsequenceassemblyfourstagesofdataprocessingandcomputationalchallenges
AT helmymohamed nextgenerationsequenceassemblyfourstagesofdataprocessingandcomputationalchallenges