Cargando…
SeedsGraph: an efficient assembler for next-generation sequencing data
DNA sequencing technology has been rapidly evolving, and produces a large number of short reads with a fast rising tendency. This has led to a resurgence of research in whole genome shotgun assembly algorithms. We start the assembly algorithm by clustering the short reads in a cloud computing framew...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4460749/ https://www.ncbi.nlm.nih.gov/pubmed/26044652 http://dx.doi.org/10.1186/1755-8794-8-S2-S13 |
_version_ | 1782375428405592064 |
---|---|
author | Wang, Chunyu Guo, Maozu Liu, Xiaoyan Liu, Yang Zou, Quan |
author_facet | Wang, Chunyu Guo, Maozu Liu, Xiaoyan Liu, Yang Zou, Quan |
author_sort | Wang, Chunyu |
collection | PubMed |
description | DNA sequencing technology has been rapidly evolving, and produces a large number of short reads with a fast rising tendency. This has led to a resurgence of research in whole genome shotgun assembly algorithms. We start the assembly algorithm by clustering the short reads in a cloud computing framework, and the clustering process groups fragments according to their original consensus long-sequence similarity. We condense each group of reads to a chain of seeds, which is a kind of substring with reads aligned, and then build a graph accordingly. Finally, we analyze the graph to find Euler paths, and assemble the reads related in the paths into contigs, and then lay out contigs with mate-pair information for scaffolds. The result shows that our algorithm is efficient and feasible for a large set of reads such as in next-generation sequencing technology. |
format | Online Article Text |
id | pubmed-4460749 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-44607492015-06-29 SeedsGraph: an efficient assembler for next-generation sequencing data Wang, Chunyu Guo, Maozu Liu, Xiaoyan Liu, Yang Zou, Quan BMC Med Genomics Research DNA sequencing technology has been rapidly evolving, and produces a large number of short reads with a fast rising tendency. This has led to a resurgence of research in whole genome shotgun assembly algorithms. We start the assembly algorithm by clustering the short reads in a cloud computing framework, and the clustering process groups fragments according to their original consensus long-sequence similarity. We condense each group of reads to a chain of seeds, which is a kind of substring with reads aligned, and then build a graph accordingly. Finally, we analyze the graph to find Euler paths, and assemble the reads related in the paths into contigs, and then lay out contigs with mate-pair information for scaffolds. The result shows that our algorithm is efficient and feasible for a large set of reads such as in next-generation sequencing technology. BioMed Central 2015-05-29 /pmc/articles/PMC4460749/ /pubmed/26044652 http://dx.doi.org/10.1186/1755-8794-8-S2-S13 Text en Copyright © 2015 Wang et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Wang, Chunyu Guo, Maozu Liu, Xiaoyan Liu, Yang Zou, Quan SeedsGraph: an efficient assembler for next-generation sequencing data |
title | SeedsGraph: an efficient assembler for next-generation sequencing data |
title_full | SeedsGraph: an efficient assembler for next-generation sequencing data |
title_fullStr | SeedsGraph: an efficient assembler for next-generation sequencing data |
title_full_unstemmed | SeedsGraph: an efficient assembler for next-generation sequencing data |
title_short | SeedsGraph: an efficient assembler for next-generation sequencing data |
title_sort | seedsgraph: an efficient assembler for next-generation sequencing data |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4460749/ https://www.ncbi.nlm.nih.gov/pubmed/26044652 http://dx.doi.org/10.1186/1755-8794-8-S2-S13 |
work_keys_str_mv | AT wangchunyu seedsgraphanefficientassemblerfornextgenerationsequencingdata AT guomaozu seedsgraphanefficientassemblerfornextgenerationsequencingdata AT liuxiaoyan seedsgraphanefficientassemblerfornextgenerationsequencingdata AT liuyang seedsgraphanefficientassemblerfornextgenerationsequencingdata AT zouquan seedsgraphanefficientassemblerfornextgenerationsequencingdata |