Cargando…

SeedsGraph: an efficient assembler for next-generation sequencing data

DNA sequencing technology has been rapidly evolving, and produces a large number of short reads with a fast rising tendency. This has led to a resurgence of research in whole genome shotgun assembly algorithms. We start the assembly algorithm by clustering the short reads in a cloud computing framew...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Chunyu, Guo, Maozu, Liu, Xiaoyan, Liu, Yang, Zou, Quan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4460749/
https://www.ncbi.nlm.nih.gov/pubmed/26044652
http://dx.doi.org/10.1186/1755-8794-8-S2-S13
_version_ 1782375428405592064
author Wang, Chunyu
Guo, Maozu
Liu, Xiaoyan
Liu, Yang
Zou, Quan
author_facet Wang, Chunyu
Guo, Maozu
Liu, Xiaoyan
Liu, Yang
Zou, Quan
author_sort Wang, Chunyu
collection PubMed
description DNA sequencing technology has been rapidly evolving, and produces a large number of short reads with a fast rising tendency. This has led to a resurgence of research in whole genome shotgun assembly algorithms. We start the assembly algorithm by clustering the short reads in a cloud computing framework, and the clustering process groups fragments according to their original consensus long-sequence similarity. We condense each group of reads to a chain of seeds, which is a kind of substring with reads aligned, and then build a graph accordingly. Finally, we analyze the graph to find Euler paths, and assemble the reads related in the paths into contigs, and then lay out contigs with mate-pair information for scaffolds. The result shows that our algorithm is efficient and feasible for a large set of reads such as in next-generation sequencing technology.
format Online
Article
Text
id pubmed-4460749
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-44607492015-06-29 SeedsGraph: an efficient assembler for next-generation sequencing data Wang, Chunyu Guo, Maozu Liu, Xiaoyan Liu, Yang Zou, Quan BMC Med Genomics Research DNA sequencing technology has been rapidly evolving, and produces a large number of short reads with a fast rising tendency. This has led to a resurgence of research in whole genome shotgun assembly algorithms. We start the assembly algorithm by clustering the short reads in a cloud computing framework, and the clustering process groups fragments according to their original consensus long-sequence similarity. We condense each group of reads to a chain of seeds, which is a kind of substring with reads aligned, and then build a graph accordingly. Finally, we analyze the graph to find Euler paths, and assemble the reads related in the paths into contigs, and then lay out contigs with mate-pair information for scaffolds. The result shows that our algorithm is efficient and feasible for a large set of reads such as in next-generation sequencing technology. BioMed Central 2015-05-29 /pmc/articles/PMC4460749/ /pubmed/26044652 http://dx.doi.org/10.1186/1755-8794-8-S2-S13 Text en Copyright © 2015 Wang et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Wang, Chunyu
Guo, Maozu
Liu, Xiaoyan
Liu, Yang
Zou, Quan
SeedsGraph: an efficient assembler for next-generation sequencing data
title SeedsGraph: an efficient assembler for next-generation sequencing data
title_full SeedsGraph: an efficient assembler for next-generation sequencing data
title_fullStr SeedsGraph: an efficient assembler for next-generation sequencing data
title_full_unstemmed SeedsGraph: an efficient assembler for next-generation sequencing data
title_short SeedsGraph: an efficient assembler for next-generation sequencing data
title_sort seedsgraph: an efficient assembler for next-generation sequencing data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4460749/
https://www.ncbi.nlm.nih.gov/pubmed/26044652
http://dx.doi.org/10.1186/1755-8794-8-S2-S13
work_keys_str_mv AT wangchunyu seedsgraphanefficientassemblerfornextgenerationsequencingdata
AT guomaozu seedsgraphanefficientassemblerfornextgenerationsequencingdata
AT liuxiaoyan seedsgraphanefficientassemblerfornextgenerationsequencingdata
AT liuyang seedsgraphanefficientassemblerfornextgenerationsequencingdata
AT zouquan seedsgraphanefficientassemblerfornextgenerationsequencingdata