Cargando…
Next-Generation Sequence Assembly: Four Stages of Data Processing and Computational Challenges
Decoding DNA symbols using next-generation sequencers was a major breakthrough in genomic research. Despite the many advantages of next-generation sequencers, e.g., the high-throughput sequencing rate and relatively low cost of sequencing, the assembly of the reads produced by these sequencers still...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3861042/ https://www.ncbi.nlm.nih.gov/pubmed/24348224 http://dx.doi.org/10.1371/journal.pcbi.1003345 |
_version_ | 1782295590066978816 |
---|---|
author | El-Metwally, Sara Hamza, Taher Zakaria, Magdi Helmy, Mohamed |
author_facet | El-Metwally, Sara Hamza, Taher Zakaria, Magdi Helmy, Mohamed |
author_sort | El-Metwally, Sara |
collection | PubMed |
description | Decoding DNA symbols using next-generation sequencers was a major breakthrough in genomic research. Despite the many advantages of next-generation sequencers, e.g., the high-throughput sequencing rate and relatively low cost of sequencing, the assembly of the reads produced by these sequencers still remains a major challenge. In this review, we address the basic framework of next-generation genome sequence assemblers, which comprises four basic stages: preprocessing filtering, a graph construction process, a graph simplification process, and postprocessing filtering. Here we discuss them as a framework of four stages for data analysis and processing and survey variety of techniques, algorithms, and software tools used during each stage. We also discuss the challenges that face current assemblers in the next-generation environment to determine the current state-of-the-art. We recommend a layered architecture approach for constructing a general assembler that can handle the sequences generated by different sequencing platforms. |
format | Online Article Text |
id | pubmed-3861042 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-38610422013-12-17 Next-Generation Sequence Assembly: Four Stages of Data Processing and Computational Challenges El-Metwally, Sara Hamza, Taher Zakaria, Magdi Helmy, Mohamed PLoS Comput Biol Review Decoding DNA symbols using next-generation sequencers was a major breakthrough in genomic research. Despite the many advantages of next-generation sequencers, e.g., the high-throughput sequencing rate and relatively low cost of sequencing, the assembly of the reads produced by these sequencers still remains a major challenge. In this review, we address the basic framework of next-generation genome sequence assemblers, which comprises four basic stages: preprocessing filtering, a graph construction process, a graph simplification process, and postprocessing filtering. Here we discuss them as a framework of four stages for data analysis and processing and survey variety of techniques, algorithms, and software tools used during each stage. We also discuss the challenges that face current assemblers in the next-generation environment to determine the current state-of-the-art. We recommend a layered architecture approach for constructing a general assembler that can handle the sequences generated by different sequencing platforms. Public Library of Science 2013-12-12 /pmc/articles/PMC3861042/ /pubmed/24348224 http://dx.doi.org/10.1371/journal.pcbi.1003345 Text en © 2013 El-Metwally et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Review El-Metwally, Sara Hamza, Taher Zakaria, Magdi Helmy, Mohamed Next-Generation Sequence Assembly: Four Stages of Data Processing and Computational Challenges |
title | Next-Generation Sequence Assembly: Four Stages of Data Processing and Computational Challenges |
title_full | Next-Generation Sequence Assembly: Four Stages of Data Processing and Computational Challenges |
title_fullStr | Next-Generation Sequence Assembly: Four Stages of Data Processing and Computational Challenges |
title_full_unstemmed | Next-Generation Sequence Assembly: Four Stages of Data Processing and Computational Challenges |
title_short | Next-Generation Sequence Assembly: Four Stages of Data Processing and Computational Challenges |
title_sort | next-generation sequence assembly: four stages of data processing and computational challenges |
topic | Review |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3861042/ https://www.ncbi.nlm.nih.gov/pubmed/24348224 http://dx.doi.org/10.1371/journal.pcbi.1003345 |
work_keys_str_mv | AT elmetwallysara nextgenerationsequenceassemblyfourstagesofdataprocessingandcomputationalchallenges AT hamzataher nextgenerationsequenceassemblyfourstagesofdataprocessingandcomputationalchallenges AT zakariamagdi nextgenerationsequenceassemblyfourstagesofdataprocessingandcomputationalchallenges AT helmymohamed nextgenerationsequenceassemblyfourstagesofdataprocessingandcomputationalchallenges |