Cargando…
An efficient and scalable graph modeling approach for capturing information at different levels in next generation sequencing reads
BACKGROUND: Next generation sequencing technologies have greatly advanced many research areas of the biomedical sciences through their capability to generate massive amounts of genetic information at unprecedented rates. The advent of next generation sequencing has led to the development of numerous...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3816315/ https://www.ncbi.nlm.nih.gov/pubmed/24564333 http://dx.doi.org/10.1186/1471-2105-14-S11-S7 |
_version_ | 1782477946686013440 |
---|---|
author | Warnke, Julia D Ali, Hesham H |
author_facet | Warnke, Julia D Ali, Hesham H |
author_sort | Warnke, Julia D |
collection | PubMed |
description | BACKGROUND: Next generation sequencing technologies have greatly advanced many research areas of the biomedical sciences through their capability to generate massive amounts of genetic information at unprecedented rates. The advent of next generation sequencing has led to the development of numerous computational tools to analyze and assemble the millions to billions of short sequencing reads produced by these technologies. While these tools filled an important gap, current approaches for storing, processing, and analyzing short read datasets generally have remained simple and lack the complexity needed to efficiently model the produced reads and assemble them correctly. RESULTS: Previously, we presented an overlap graph coarsening scheme for modeling read overlap relationships on multiple levels. Most current read assembly and analysis approaches use a single graph or set of clusters to represent the relationships among a read dataset. Instead, we use a series of graphs to represent the reads and their overlap relationships across a spectrum of information granularity. At each information level our algorithm is capable of generating clusters of reads from the reduced graph, forming an integrated graph modeling and clustering approach for read analysis and assembly. Previously we applied our algorithm to simulated and real 454 datasets to assess its ability to efficiently model and cluster next generation sequencing data. In this paper we extend our algorithm to large simulated and real Illumina datasets to demonstrate that our algorithm is practical for both sequencing technologies. CONCLUSIONS: Our overlap graph theoretic algorithm is able to model next generation sequencing reads at various levels of granularity through the process of graph coarsening. Additionally, our model allows for efficient representation of the read overlap relationships, is scalable for large datasets, and is practical for both Illumina and 454 sequencing technologies. |
format | Online Article Text |
id | pubmed-3816315 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-38163152013-11-04 An efficient and scalable graph modeling approach for capturing information at different levels in next generation sequencing reads Warnke, Julia D Ali, Hesham H BMC Bioinformatics Research BACKGROUND: Next generation sequencing technologies have greatly advanced many research areas of the biomedical sciences through their capability to generate massive amounts of genetic information at unprecedented rates. The advent of next generation sequencing has led to the development of numerous computational tools to analyze and assemble the millions to billions of short sequencing reads produced by these technologies. While these tools filled an important gap, current approaches for storing, processing, and analyzing short read datasets generally have remained simple and lack the complexity needed to efficiently model the produced reads and assemble them correctly. RESULTS: Previously, we presented an overlap graph coarsening scheme for modeling read overlap relationships on multiple levels. Most current read assembly and analysis approaches use a single graph or set of clusters to represent the relationships among a read dataset. Instead, we use a series of graphs to represent the reads and their overlap relationships across a spectrum of information granularity. At each information level our algorithm is capable of generating clusters of reads from the reduced graph, forming an integrated graph modeling and clustering approach for read analysis and assembly. Previously we applied our algorithm to simulated and real 454 datasets to assess its ability to efficiently model and cluster next generation sequencing data. In this paper we extend our algorithm to large simulated and real Illumina datasets to demonstrate that our algorithm is practical for both sequencing technologies. CONCLUSIONS: Our overlap graph theoretic algorithm is able to model next generation sequencing reads at various levels of granularity through the process of graph coarsening. Additionally, our model allows for efficient representation of the read overlap relationships, is scalable for large datasets, and is practical for both Illumina and 454 sequencing technologies. BioMed Central 2013-09-13 /pmc/articles/PMC3816315/ /pubmed/24564333 http://dx.doi.org/10.1186/1471-2105-14-S11-S7 Text en Copyright © 2013 Warnke and Ali; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Warnke, Julia D Ali, Hesham H An efficient and scalable graph modeling approach for capturing information at different levels in next generation sequencing reads |
title | An efficient and scalable graph modeling approach for capturing information at different levels in next generation sequencing reads |
title_full | An efficient and scalable graph modeling approach for capturing information at different levels in next generation sequencing reads |
title_fullStr | An efficient and scalable graph modeling approach for capturing information at different levels in next generation sequencing reads |
title_full_unstemmed | An efficient and scalable graph modeling approach for capturing information at different levels in next generation sequencing reads |
title_short | An efficient and scalable graph modeling approach for capturing information at different levels in next generation sequencing reads |
title_sort | efficient and scalable graph modeling approach for capturing information at different levels in next generation sequencing reads |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3816315/ https://www.ncbi.nlm.nih.gov/pubmed/24564333 http://dx.doi.org/10.1186/1471-2105-14-S11-S7 |
work_keys_str_mv | AT warnkejuliad anefficientandscalablegraphmodelingapproachforcapturinginformationatdifferentlevelsinnextgenerationsequencingreads AT aliheshamh anefficientandscalablegraphmodelingapproachforcapturinginformationatdifferentlevelsinnextgenerationsequencingreads AT warnkejuliad efficientandscalablegraphmodelingapproachforcapturinginformationatdifferentlevelsinnextgenerationsequencingreads AT aliheshamh efficientandscalablegraphmodelingapproachforcapturinginformationatdifferentlevelsinnextgenerationsequencingreads |