Cargando…

SPRINT: A new parallel framework for R

BACKGROUND: Microarray analysis allows the simultaneous measurement of thousands to millions of genes or sequences across tens to thousands of different samples. The analysis of the resulting data tests the limits of existing bioinformatics computing infrastructure. A solution to this issue is to us...

Descripción completa

Detalles Bibliográficos
Autores principales:	Hill, Jon, Hambley, Matthew, Forster, Thorsten, Mewissen, Muriel, Sloan, Terence M, Scharinger, Florian, Trew, Arthur, Ghazal, Peter
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2008
Materias:	Software
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2628907/ https://www.ncbi.nlm.nih.gov/pubmed/19114001 http://dx.doi.org/10.1186/1471-2105-9-558

_version_	1782163748959551488
author	Hill, Jon Hambley, Matthew Forster, Thorsten Mewissen, Muriel Sloan, Terence M Scharinger, Florian Trew, Arthur Ghazal, Peter
author_facet	Hill, Jon Hambley, Matthew Forster, Thorsten Mewissen, Muriel Sloan, Terence M Scharinger, Florian Trew, Arthur Ghazal, Peter
author_sort	Hill, Jon
collection	PubMed
description	BACKGROUND: Microarray analysis allows the simultaneous measurement of thousands to millions of genes or sequences across tens to thousands of different samples. The analysis of the resulting data tests the limits of existing bioinformatics computing infrastructure. A solution to this issue is to use High Performance Computing (HPC) systems, which contain many processors and more memory than desktop computer systems. Many biostatisticians use R to process the data gleaned from microarray analysis and there is even a dedicated group of packages, Bioconductor, for this purpose. However, to exploit HPC systems, R must be able to utilise the multiple processors available on these systems. There are existing modules that enable R to use multiple processors, but these are either difficult to use for the HPC novice or cannot be used to solve certain classes of problems. A method of exploiting HPC systems, using R, but without recourse to mastering parallel programming paradigms is therefore necessary to analyse genomic data to its fullest. RESULTS: We have designed and built a prototype framework that allows the addition of parallelised functions to R to enable the easy exploitation of HPC systems. The Simple Parallel R INTerface (SPRINT) is a wrapper around such parallelised functions. Their use requires very little modification to existing sequential R scripts and no expertise in parallel computing. As an example we created a function that carries out the computation of a pairwise calculated correlation matrix. This performs well with SPRINT. When executed using SPRINT on an HPC resource of eight processors this computation reduces by more than three times the time R takes to complete it on one processor. CONCLUSION: SPRINT allows the biostatistician to concentrate on the research problems rather than the computation, while still allowing exploitation of HPC systems. It is easy to use and with further development will become more useful as more functions are added to the framework.
format	Text
id	pubmed-2628907
institution	National Center for Biotechnology Information
language	English
publishDate	2008
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-26289072009-01-21 SPRINT: A new parallel framework for R Hill, Jon Hambley, Matthew Forster, Thorsten Mewissen, Muriel Sloan, Terence M Scharinger, Florian Trew, Arthur Ghazal, Peter BMC Bioinformatics Software BACKGROUND: Microarray analysis allows the simultaneous measurement of thousands to millions of genes or sequences across tens to thousands of different samples. The analysis of the resulting data tests the limits of existing bioinformatics computing infrastructure. A solution to this issue is to use High Performance Computing (HPC) systems, which contain many processors and more memory than desktop computer systems. Many biostatisticians use R to process the data gleaned from microarray analysis and there is even a dedicated group of packages, Bioconductor, for this purpose. However, to exploit HPC systems, R must be able to utilise the multiple processors available on these systems. There are existing modules that enable R to use multiple processors, but these are either difficult to use for the HPC novice or cannot be used to solve certain classes of problems. A method of exploiting HPC systems, using R, but without recourse to mastering parallel programming paradigms is therefore necessary to analyse genomic data to its fullest. RESULTS: We have designed and built a prototype framework that allows the addition of parallelised functions to R to enable the easy exploitation of HPC systems. The Simple Parallel R INTerface (SPRINT) is a wrapper around such parallelised functions. Their use requires very little modification to existing sequential R scripts and no expertise in parallel computing. As an example we created a function that carries out the computation of a pairwise calculated correlation matrix. This performs well with SPRINT. When executed using SPRINT on an HPC resource of eight processors this computation reduces by more than three times the time R takes to complete it on one processor. CONCLUSION: SPRINT allows the biostatistician to concentrate on the research problems rather than the computation, while still allowing exploitation of HPC systems. It is easy to use and with further development will become more useful as more functions are added to the framework. BioMed Central 2008-12-29 /pmc/articles/PMC2628907/ /pubmed/19114001 http://dx.doi.org/10.1186/1471-2105-9-558 Text en Copyright © 2008 Hill et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Software Hill, Jon Hambley, Matthew Forster, Thorsten Mewissen, Muriel Sloan, Terence M Scharinger, Florian Trew, Arthur Ghazal, Peter SPRINT: A new parallel framework for R
title	SPRINT: A new parallel framework for R
title_full	SPRINT: A new parallel framework for R
title_fullStr	SPRINT: A new parallel framework for R
title_full_unstemmed	SPRINT: A new parallel framework for R
title_short	SPRINT: A new parallel framework for R
title_sort	sprint: a new parallel framework for r
topic	Software
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2628907/ https://www.ncbi.nlm.nih.gov/pubmed/19114001 http://dx.doi.org/10.1186/1471-2105-9-558
work_keys_str_mv	AT hilljon sprintanewparallelframeworkforr AT hambleymatthew sprintanewparallelframeworkforr AT forsterthorsten sprintanewparallelframeworkforr AT mewissenmuriel sprintanewparallelframeworkforr AT sloanterencem sprintanewparallelframeworkforr AT scharingerflorian sprintanewparallelframeworkforr AT trewarthur sprintanewparallelframeworkforr AT ghazalpeter sprintanewparallelframeworkforr

SPRINT: A new parallel framework for R

Ejemplares similares