Cargando…

Optimized R functions for analysis of ecological community data using the R virtual laboratory (RvLab)

Abstract. BACKGROUND: Parallel data manipulation using R has previously been addressed by members of the R community, however most of these studies produce ad hoc solutions that are not readily available to the average R user. Our targeted users, ranging from the expert ecologist/microbiologists to...

Descripción completa

Detalles Bibliográficos
Autores principales: Varsos, Constantinos, Patkos, Theodore, Oulas, Anastasis, Pavloudi, Christina, Gougousis, Alexandros, Ijaz, Umer Zeeshan, Filiopoulou, Irene, Pattakos, Nikolaos, Vanden Berghe, Edward, Fernández-Guerra, Antonio, Faulwetter, Sarah, Chatzinikolaou, Eva, Pafilis, Evangelos, Bekiari, Chryssoula, Doerr, Martin, Arvanitidis, Christos
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Pensoft Publishers 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5136650/
https://www.ncbi.nlm.nih.gov/pubmed/27932907
http://dx.doi.org/10.3897/BDJ.4.e8357
_version_ 1782471753257189376
author Varsos, Constantinos
Patkos, Theodore
Oulas, Anastasis
Pavloudi, Christina
Gougousis, Alexandros
Ijaz, Umer Zeeshan
Filiopoulou, Irene
Pattakos, Nikolaos
Vanden Berghe, Edward
Fernández-Guerra, Antonio
Faulwetter, Sarah
Chatzinikolaou, Eva
Pafilis, Evangelos
Bekiari, Chryssoula
Doerr, Martin
Arvanitidis, Christos
author_facet Varsos, Constantinos
Patkos, Theodore
Oulas, Anastasis
Pavloudi, Christina
Gougousis, Alexandros
Ijaz, Umer Zeeshan
Filiopoulou, Irene
Pattakos, Nikolaos
Vanden Berghe, Edward
Fernández-Guerra, Antonio
Faulwetter, Sarah
Chatzinikolaou, Eva
Pafilis, Evangelos
Bekiari, Chryssoula
Doerr, Martin
Arvanitidis, Christos
author_sort Varsos, Constantinos
collection PubMed
description Abstract. BACKGROUND: Parallel data manipulation using R has previously been addressed by members of the R community, however most of these studies produce ad hoc solutions that are not readily available to the average R user. Our targeted users, ranging from the expert ecologist/microbiologists to computational biologists, often experience difficulties in finding optimal ways to exploit the full capacity of their computational resources. In addition, improving performance of commonly used R scripts becomes increasingly difficult especially with large datasets. Furthermore, the implementations described here can be of significant interest to expert bioinformaticians or R developers. Therefore, our goals can be summarized as: (i) description of a complete methodology for the analysis of large datasets by combining capabilities of diverse R packages, (ii) presentation of their application through a virtual R laboratory (RvLab) that makes execution of complex functions and visualization of results easy and readily available to the end-user. NEW INFORMATION: In this paper, the novelty stems from implementations of parallel methodologies which rely on the processing of data on different levels of abstraction and the availability of these processes through an integrated portal. Parallel implementation R packages, such as the pbdMPI (Programming with Big Data – Interface to MPI) package, are used to implement Single Program Multiple Data (SPMD) parallelization on primitive mathematical operations, allowing for interplay with functions of the vegan package. The dplyr and RPostgreSQL R packages are further integrated offering connections to dataframe like objects (databases) as secondary storage solutions whenever memory demands exceed available RAM resources. The RvLab is running on a PC cluster, using version 3.1.2 (2014-10-31) on a x86_64-pc-linux-gnu (64-bit) platform, and offers an intuitive virtual environmet interface enabling users to perform analysis of ecological and microbial communities based on optimized vegan functions. A beta version of the RvLab is available after registration at: https://portal.lifewatchgreece.eu/
format Online
Article
Text
id pubmed-5136650
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Pensoft Publishers
record_format MEDLINE/PubMed
spelling pubmed-51366502016-12-08 Optimized R functions for analysis of ecological community data using the R virtual laboratory (RvLab) Varsos, Constantinos Patkos, Theodore Oulas, Anastasis Pavloudi, Christina Gougousis, Alexandros Ijaz, Umer Zeeshan Filiopoulou, Irene Pattakos, Nikolaos Vanden Berghe, Edward Fernández-Guerra, Antonio Faulwetter, Sarah Chatzinikolaou, Eva Pafilis, Evangelos Bekiari, Chryssoula Doerr, Martin Arvanitidis, Christos Biodivers Data J Software Description Abstract. BACKGROUND: Parallel data manipulation using R has previously been addressed by members of the R community, however most of these studies produce ad hoc solutions that are not readily available to the average R user. Our targeted users, ranging from the expert ecologist/microbiologists to computational biologists, often experience difficulties in finding optimal ways to exploit the full capacity of their computational resources. In addition, improving performance of commonly used R scripts becomes increasingly difficult especially with large datasets. Furthermore, the implementations described here can be of significant interest to expert bioinformaticians or R developers. Therefore, our goals can be summarized as: (i) description of a complete methodology for the analysis of large datasets by combining capabilities of diverse R packages, (ii) presentation of their application through a virtual R laboratory (RvLab) that makes execution of complex functions and visualization of results easy and readily available to the end-user. NEW INFORMATION: In this paper, the novelty stems from implementations of parallel methodologies which rely on the processing of data on different levels of abstraction and the availability of these processes through an integrated portal. Parallel implementation R packages, such as the pbdMPI (Programming with Big Data – Interface to MPI) package, are used to implement Single Program Multiple Data (SPMD) parallelization on primitive mathematical operations, allowing for interplay with functions of the vegan package. The dplyr and RPostgreSQL R packages are further integrated offering connections to dataframe like objects (databases) as secondary storage solutions whenever memory demands exceed available RAM resources. The RvLab is running on a PC cluster, using version 3.1.2 (2014-10-31) on a x86_64-pc-linux-gnu (64-bit) platform, and offers an intuitive virtual environmet interface enabling users to perform analysis of ecological and microbial communities based on optimized vegan functions. A beta version of the RvLab is available after registration at: https://portal.lifewatchgreece.eu/ Pensoft Publishers 2016-11-01 /pmc/articles/PMC5136650/ /pubmed/27932907 http://dx.doi.org/10.3897/BDJ.4.e8357 Text en Constantinos Varsos, Theodore Patkos, Anastasis Oulas, Christina Pavloudi, Alexandros Gougousis, Umer Zeeshan Ijaz, Irene Filiopoulou, Nikolaos Pattakos, Edward Vanden Berghe, Antonio Fernández-Guerra, Sarah Faulwetter, Eva Chatzinikolaou, Evangelos Pafilis, Chryssoula Bekiari, Martin Doerr, Christos Arvanitidis http://creativecommons.org/licenses/by/4.0 This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0 (CC-BY), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Software Description
Varsos, Constantinos
Patkos, Theodore
Oulas, Anastasis
Pavloudi, Christina
Gougousis, Alexandros
Ijaz, Umer Zeeshan
Filiopoulou, Irene
Pattakos, Nikolaos
Vanden Berghe, Edward
Fernández-Guerra, Antonio
Faulwetter, Sarah
Chatzinikolaou, Eva
Pafilis, Evangelos
Bekiari, Chryssoula
Doerr, Martin
Arvanitidis, Christos
Optimized R functions for analysis of ecological community data using the R virtual laboratory (RvLab)
title Optimized R functions for analysis of ecological community data using the R virtual laboratory (RvLab)
title_full Optimized R functions for analysis of ecological community data using the R virtual laboratory (RvLab)
title_fullStr Optimized R functions for analysis of ecological community data using the R virtual laboratory (RvLab)
title_full_unstemmed Optimized R functions for analysis of ecological community data using the R virtual laboratory (RvLab)
title_short Optimized R functions for analysis of ecological community data using the R virtual laboratory (RvLab)
title_sort optimized r functions for analysis of ecological community data using the r virtual laboratory (rvlab)
topic Software Description
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5136650/
https://www.ncbi.nlm.nih.gov/pubmed/27932907
http://dx.doi.org/10.3897/BDJ.4.e8357
work_keys_str_mv AT varsosconstantinos optimizedrfunctionsforanalysisofecologicalcommunitydatausingthervirtuallaboratoryrvlab
AT patkostheodore optimizedrfunctionsforanalysisofecologicalcommunitydatausingthervirtuallaboratoryrvlab
AT oulasanastasis optimizedrfunctionsforanalysisofecologicalcommunitydatausingthervirtuallaboratoryrvlab
AT pavloudichristina optimizedrfunctionsforanalysisofecologicalcommunitydatausingthervirtuallaboratoryrvlab
AT gougousisalexandros optimizedrfunctionsforanalysisofecologicalcommunitydatausingthervirtuallaboratoryrvlab
AT ijazumerzeeshan optimizedrfunctionsforanalysisofecologicalcommunitydatausingthervirtuallaboratoryrvlab
AT filiopoulouirene optimizedrfunctionsforanalysisofecologicalcommunitydatausingthervirtuallaboratoryrvlab
AT pattakosnikolaos optimizedrfunctionsforanalysisofecologicalcommunitydatausingthervirtuallaboratoryrvlab
AT vandenbergheedward optimizedrfunctionsforanalysisofecologicalcommunitydatausingthervirtuallaboratoryrvlab
AT fernandezguerraantonio optimizedrfunctionsforanalysisofecologicalcommunitydatausingthervirtuallaboratoryrvlab
AT faulwettersarah optimizedrfunctionsforanalysisofecologicalcommunitydatausingthervirtuallaboratoryrvlab
AT chatzinikolaoueva optimizedrfunctionsforanalysisofecologicalcommunitydatausingthervirtuallaboratoryrvlab
AT pafilisevangelos optimizedrfunctionsforanalysisofecologicalcommunitydatausingthervirtuallaboratoryrvlab
AT bekiarichryssoula optimizedrfunctionsforanalysisofecologicalcommunitydatausingthervirtuallaboratoryrvlab
AT doerrmartin optimizedrfunctionsforanalysisofecologicalcommunitydatausingthervirtuallaboratoryrvlab
AT arvanitidischristos optimizedrfunctionsforanalysisofecologicalcommunitydatausingthervirtuallaboratoryrvlab