Cargando…

Fractal MapReduce decomposition of sequence alignment

BACKGROUND: The dramatic fall in the cost of genomic sequencing, and the increasing convenience of distributed cloud computing resources, positions the MapReduce coding pattern as a cornerstone of scalable bioinformatics algorithm development. In some cases an algorithm will find a natural distribut...

Descripción completa

Detalles Bibliográficos
Autores principales: Almeida, Jonas S, Grüneberg, Alexander, Maass, Wolfgang, Vinga, Susana
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3394223/
https://www.ncbi.nlm.nih.gov/pubmed/22551205
http://dx.doi.org/10.1186/1748-7188-7-12
_version_ 1782237832431009792
author Almeida, Jonas S
Grüneberg, Alexander
Maass, Wolfgang
Vinga, Susana
author_facet Almeida, Jonas S
Grüneberg, Alexander
Maass, Wolfgang
Vinga, Susana
author_sort Almeida, Jonas S
collection PubMed
description BACKGROUND: The dramatic fall in the cost of genomic sequencing, and the increasing convenience of distributed cloud computing resources, positions the MapReduce coding pattern as a cornerstone of scalable bioinformatics algorithm development. In some cases an algorithm will find a natural distribution via use of map functions to process vectorized components, followed by a reduce of aggregate intermediate results. However, for some data analysis procedures such as sequence analysis, a more fundamental reformulation may be required. RESULTS: In this report we describe a solution to sequence comparison that can be thoroughly decomposed into multiple rounds of map and reduce operations. The route taken makes use of iterated maps, a fractal analysis technique, that has been found to provide a "alignment-free" solution to sequence analysis and comparison. That is, a solution that does not require dynamic programming, relying on a numeric Chaos Game Representation (CGR) data structure. This claim is demonstrated in this report by calculating the length of the longest similar segment by inspecting only the USM coordinates of two analogous units: with no resort to dynamic programming. CONCLUSIONS: The procedure described is an attempt at extreme decomposition and parallelization of sequence alignment in anticipation of a volume of genomic sequence data that cannot be met by current algorithmic frameworks. The solution found is delivered with a browser-based application (webApp), highlighting the browser's emergence as an environment for high performance distributed computing. AVAILABILITY: Public distribution of accompanying software library with open source and version control at http://usm.github.com. Also available as a webApp through Google Chrome's WebStore http://chrome.google.com/webstore: search with "usm".
format Online
Article
Text
id pubmed-3394223
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-33942232012-07-16 Fractal MapReduce decomposition of sequence alignment Almeida, Jonas S Grüneberg, Alexander Maass, Wolfgang Vinga, Susana Algorithms Mol Biol Research BACKGROUND: The dramatic fall in the cost of genomic sequencing, and the increasing convenience of distributed cloud computing resources, positions the MapReduce coding pattern as a cornerstone of scalable bioinformatics algorithm development. In some cases an algorithm will find a natural distribution via use of map functions to process vectorized components, followed by a reduce of aggregate intermediate results. However, for some data analysis procedures such as sequence analysis, a more fundamental reformulation may be required. RESULTS: In this report we describe a solution to sequence comparison that can be thoroughly decomposed into multiple rounds of map and reduce operations. The route taken makes use of iterated maps, a fractal analysis technique, that has been found to provide a "alignment-free" solution to sequence analysis and comparison. That is, a solution that does not require dynamic programming, relying on a numeric Chaos Game Representation (CGR) data structure. This claim is demonstrated in this report by calculating the length of the longest similar segment by inspecting only the USM coordinates of two analogous units: with no resort to dynamic programming. CONCLUSIONS: The procedure described is an attempt at extreme decomposition and parallelization of sequence alignment in anticipation of a volume of genomic sequence data that cannot be met by current algorithmic frameworks. The solution found is delivered with a browser-based application (webApp), highlighting the browser's emergence as an environment for high performance distributed computing. AVAILABILITY: Public distribution of accompanying software library with open source and version control at http://usm.github.com. Also available as a webApp through Google Chrome's WebStore http://chrome.google.com/webstore: search with "usm". BioMed Central 2012-05-02 /pmc/articles/PMC3394223/ /pubmed/22551205 http://dx.doi.org/10.1186/1748-7188-7-12 Text en Copyright ©2012 Almeida et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Almeida, Jonas S
Grüneberg, Alexander
Maass, Wolfgang
Vinga, Susana
Fractal MapReduce decomposition of sequence alignment
title Fractal MapReduce decomposition of sequence alignment
title_full Fractal MapReduce decomposition of sequence alignment
title_fullStr Fractal MapReduce decomposition of sequence alignment
title_full_unstemmed Fractal MapReduce decomposition of sequence alignment
title_short Fractal MapReduce decomposition of sequence alignment
title_sort fractal mapreduce decomposition of sequence alignment
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3394223/
https://www.ncbi.nlm.nih.gov/pubmed/22551205
http://dx.doi.org/10.1186/1748-7188-7-12
work_keys_str_mv AT almeidajonass fractalmapreducedecompositionofsequencealignment
AT grunebergalexander fractalmapreducedecompositionofsequencealignment
AT maasswolfgang fractalmapreducedecompositionofsequencealignment
AT vingasusana fractalmapreducedecompositionofsequencealignment