Cargando…

An efficient and scalable graph modeling approach for capturing information at different levels in next generation sequencing reads

BACKGROUND: Next generation sequencing technologies have greatly advanced many research areas of the biomedical sciences through their capability to generate massive amounts of genetic information at unprecedented rates. The advent of next generation sequencing has led to the development of numerous...

Descripción completa

Detalles Bibliográficos
Autores principales: Warnke, Julia D, Ali, Hesham H
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3816315/
https://www.ncbi.nlm.nih.gov/pubmed/24564333
http://dx.doi.org/10.1186/1471-2105-14-S11-S7
_version_ 1782477946686013440
author Warnke, Julia D
Ali, Hesham H
author_facet Warnke, Julia D
Ali, Hesham H
author_sort Warnke, Julia D
collection PubMed
description BACKGROUND: Next generation sequencing technologies have greatly advanced many research areas of the biomedical sciences through their capability to generate massive amounts of genetic information at unprecedented rates. The advent of next generation sequencing has led to the development of numerous computational tools to analyze and assemble the millions to billions of short sequencing reads produced by these technologies. While these tools filled an important gap, current approaches for storing, processing, and analyzing short read datasets generally have remained simple and lack the complexity needed to efficiently model the produced reads and assemble them correctly. RESULTS: Previously, we presented an overlap graph coarsening scheme for modeling read overlap relationships on multiple levels. Most current read assembly and analysis approaches use a single graph or set of clusters to represent the relationships among a read dataset. Instead, we use a series of graphs to represent the reads and their overlap relationships across a spectrum of information granularity. At each information level our algorithm is capable of generating clusters of reads from the reduced graph, forming an integrated graph modeling and clustering approach for read analysis and assembly. Previously we applied our algorithm to simulated and real 454 datasets to assess its ability to efficiently model and cluster next generation sequencing data. In this paper we extend our algorithm to large simulated and real Illumina datasets to demonstrate that our algorithm is practical for both sequencing technologies. CONCLUSIONS: Our overlap graph theoretic algorithm is able to model next generation sequencing reads at various levels of granularity through the process of graph coarsening. Additionally, our model allows for efficient representation of the read overlap relationships, is scalable for large datasets, and is practical for both Illumina and 454 sequencing technologies.
format Online
Article
Text
id pubmed-3816315
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-38163152013-11-04 An efficient and scalable graph modeling approach for capturing information at different levels in next generation sequencing reads Warnke, Julia D Ali, Hesham H BMC Bioinformatics Research BACKGROUND: Next generation sequencing technologies have greatly advanced many research areas of the biomedical sciences through their capability to generate massive amounts of genetic information at unprecedented rates. The advent of next generation sequencing has led to the development of numerous computational tools to analyze and assemble the millions to billions of short sequencing reads produced by these technologies. While these tools filled an important gap, current approaches for storing, processing, and analyzing short read datasets generally have remained simple and lack the complexity needed to efficiently model the produced reads and assemble them correctly. RESULTS: Previously, we presented an overlap graph coarsening scheme for modeling read overlap relationships on multiple levels. Most current read assembly and analysis approaches use a single graph or set of clusters to represent the relationships among a read dataset. Instead, we use a series of graphs to represent the reads and their overlap relationships across a spectrum of information granularity. At each information level our algorithm is capable of generating clusters of reads from the reduced graph, forming an integrated graph modeling and clustering approach for read analysis and assembly. Previously we applied our algorithm to simulated and real 454 datasets to assess its ability to efficiently model and cluster next generation sequencing data. In this paper we extend our algorithm to large simulated and real Illumina datasets to demonstrate that our algorithm is practical for both sequencing technologies. CONCLUSIONS: Our overlap graph theoretic algorithm is able to model next generation sequencing reads at various levels of granularity through the process of graph coarsening. Additionally, our model allows for efficient representation of the read overlap relationships, is scalable for large datasets, and is practical for both Illumina and 454 sequencing technologies. BioMed Central 2013-09-13 /pmc/articles/PMC3816315/ /pubmed/24564333 http://dx.doi.org/10.1186/1471-2105-14-S11-S7 Text en Copyright © 2013 Warnke and Ali; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Warnke, Julia D
Ali, Hesham H
An efficient and scalable graph modeling approach for capturing information at different levels in next generation sequencing reads
title An efficient and scalable graph modeling approach for capturing information at different levels in next generation sequencing reads
title_full An efficient and scalable graph modeling approach for capturing information at different levels in next generation sequencing reads
title_fullStr An efficient and scalable graph modeling approach for capturing information at different levels in next generation sequencing reads
title_full_unstemmed An efficient and scalable graph modeling approach for capturing information at different levels in next generation sequencing reads
title_short An efficient and scalable graph modeling approach for capturing information at different levels in next generation sequencing reads
title_sort efficient and scalable graph modeling approach for capturing information at different levels in next generation sequencing reads
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3816315/
https://www.ncbi.nlm.nih.gov/pubmed/24564333
http://dx.doi.org/10.1186/1471-2105-14-S11-S7
work_keys_str_mv AT warnkejuliad anefficientandscalablegraphmodelingapproachforcapturinginformationatdifferentlevelsinnextgenerationsequencingreads
AT aliheshamh anefficientandscalablegraphmodelingapproachforcapturinginformationatdifferentlevelsinnextgenerationsequencingreads
AT warnkejuliad efficientandscalablegraphmodelingapproachforcapturinginformationatdifferentlevelsinnextgenerationsequencingreads
AT aliheshamh efficientandscalablegraphmodelingapproachforcapturinginformationatdifferentlevelsinnextgenerationsequencingreads