Cargando…
Optimized R functions for analysis of ecological community data using the R virtual laboratory (RvLab)
Abstract. BACKGROUND: Parallel data manipulation using R has previously been addressed by members of the R community, however most of these studies produce ad hoc solutions that are not readily available to the average R user. Our targeted users, ranging from the expert ecologist/microbiologists to...
Autores principales: | , , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Pensoft Publishers
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5136650/ https://www.ncbi.nlm.nih.gov/pubmed/27932907 http://dx.doi.org/10.3897/BDJ.4.e8357 |
_version_ | 1782471753257189376 |
---|---|
author | Varsos, Constantinos Patkos, Theodore Oulas, Anastasis Pavloudi, Christina Gougousis, Alexandros Ijaz, Umer Zeeshan Filiopoulou, Irene Pattakos, Nikolaos Vanden Berghe, Edward Fernández-Guerra, Antonio Faulwetter, Sarah Chatzinikolaou, Eva Pafilis, Evangelos Bekiari, Chryssoula Doerr, Martin Arvanitidis, Christos |
author_facet | Varsos, Constantinos Patkos, Theodore Oulas, Anastasis Pavloudi, Christina Gougousis, Alexandros Ijaz, Umer Zeeshan Filiopoulou, Irene Pattakos, Nikolaos Vanden Berghe, Edward Fernández-Guerra, Antonio Faulwetter, Sarah Chatzinikolaou, Eva Pafilis, Evangelos Bekiari, Chryssoula Doerr, Martin Arvanitidis, Christos |
author_sort | Varsos, Constantinos |
collection | PubMed |
description | Abstract. BACKGROUND: Parallel data manipulation using R has previously been addressed by members of the R community, however most of these studies produce ad hoc solutions that are not readily available to the average R user. Our targeted users, ranging from the expert ecologist/microbiologists to computational biologists, often experience difficulties in finding optimal ways to exploit the full capacity of their computational resources. In addition, improving performance of commonly used R scripts becomes increasingly difficult especially with large datasets. Furthermore, the implementations described here can be of significant interest to expert bioinformaticians or R developers. Therefore, our goals can be summarized as: (i) description of a complete methodology for the analysis of large datasets by combining capabilities of diverse R packages, (ii) presentation of their application through a virtual R laboratory (RvLab) that makes execution of complex functions and visualization of results easy and readily available to the end-user. NEW INFORMATION: In this paper, the novelty stems from implementations of parallel methodologies which rely on the processing of data on different levels of abstraction and the availability of these processes through an integrated portal. Parallel implementation R packages, such as the pbdMPI (Programming with Big Data – Interface to MPI) package, are used to implement Single Program Multiple Data (SPMD) parallelization on primitive mathematical operations, allowing for interplay with functions of the vegan package. The dplyr and RPostgreSQL R packages are further integrated offering connections to dataframe like objects (databases) as secondary storage solutions whenever memory demands exceed available RAM resources. The RvLab is running on a PC cluster, using version 3.1.2 (2014-10-31) on a x86_64-pc-linux-gnu (64-bit) platform, and offers an intuitive virtual environmet interface enabling users to perform analysis of ecological and microbial communities based on optimized vegan functions. A beta version of the RvLab is available after registration at: https://portal.lifewatchgreece.eu/ |
format | Online Article Text |
id | pubmed-5136650 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | Pensoft Publishers |
record_format | MEDLINE/PubMed |
spelling | pubmed-51366502016-12-08 Optimized R functions for analysis of ecological community data using the R virtual laboratory (RvLab) Varsos, Constantinos Patkos, Theodore Oulas, Anastasis Pavloudi, Christina Gougousis, Alexandros Ijaz, Umer Zeeshan Filiopoulou, Irene Pattakos, Nikolaos Vanden Berghe, Edward Fernández-Guerra, Antonio Faulwetter, Sarah Chatzinikolaou, Eva Pafilis, Evangelos Bekiari, Chryssoula Doerr, Martin Arvanitidis, Christos Biodivers Data J Software Description Abstract. BACKGROUND: Parallel data manipulation using R has previously been addressed by members of the R community, however most of these studies produce ad hoc solutions that are not readily available to the average R user. Our targeted users, ranging from the expert ecologist/microbiologists to computational biologists, often experience difficulties in finding optimal ways to exploit the full capacity of their computational resources. In addition, improving performance of commonly used R scripts becomes increasingly difficult especially with large datasets. Furthermore, the implementations described here can be of significant interest to expert bioinformaticians or R developers. Therefore, our goals can be summarized as: (i) description of a complete methodology for the analysis of large datasets by combining capabilities of diverse R packages, (ii) presentation of their application through a virtual R laboratory (RvLab) that makes execution of complex functions and visualization of results easy and readily available to the end-user. NEW INFORMATION: In this paper, the novelty stems from implementations of parallel methodologies which rely on the processing of data on different levels of abstraction and the availability of these processes through an integrated portal. Parallel implementation R packages, such as the pbdMPI (Programming with Big Data – Interface to MPI) package, are used to implement Single Program Multiple Data (SPMD) parallelization on primitive mathematical operations, allowing for interplay with functions of the vegan package. The dplyr and RPostgreSQL R packages are further integrated offering connections to dataframe like objects (databases) as secondary storage solutions whenever memory demands exceed available RAM resources. The RvLab is running on a PC cluster, using version 3.1.2 (2014-10-31) on a x86_64-pc-linux-gnu (64-bit) platform, and offers an intuitive virtual environmet interface enabling users to perform analysis of ecological and microbial communities based on optimized vegan functions. A beta version of the RvLab is available after registration at: https://portal.lifewatchgreece.eu/ Pensoft Publishers 2016-11-01 /pmc/articles/PMC5136650/ /pubmed/27932907 http://dx.doi.org/10.3897/BDJ.4.e8357 Text en Constantinos Varsos, Theodore Patkos, Anastasis Oulas, Christina Pavloudi, Alexandros Gougousis, Umer Zeeshan Ijaz, Irene Filiopoulou, Nikolaos Pattakos, Edward Vanden Berghe, Antonio Fernández-Guerra, Sarah Faulwetter, Eva Chatzinikolaou, Evangelos Pafilis, Chryssoula Bekiari, Martin Doerr, Christos Arvanitidis http://creativecommons.org/licenses/by/4.0 This is an open access article distributed under the terms of the Creative Commons Attribution License 4.0 (CC-BY), which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Software Description Varsos, Constantinos Patkos, Theodore Oulas, Anastasis Pavloudi, Christina Gougousis, Alexandros Ijaz, Umer Zeeshan Filiopoulou, Irene Pattakos, Nikolaos Vanden Berghe, Edward Fernández-Guerra, Antonio Faulwetter, Sarah Chatzinikolaou, Eva Pafilis, Evangelos Bekiari, Chryssoula Doerr, Martin Arvanitidis, Christos Optimized R functions for analysis of ecological community data using the R virtual laboratory (RvLab) |
title | Optimized R functions for analysis of ecological community data using the R virtual laboratory (RvLab) |
title_full | Optimized R functions for analysis of ecological community data using the R virtual laboratory (RvLab) |
title_fullStr | Optimized R functions for analysis of ecological community data using the R virtual laboratory (RvLab) |
title_full_unstemmed | Optimized R functions for analysis of ecological community data using the R virtual laboratory (RvLab) |
title_short | Optimized R functions for analysis of ecological community data using the R virtual laboratory (RvLab) |
title_sort | optimized r functions for analysis of ecological community data using the r virtual laboratory (rvlab) |
topic | Software Description |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5136650/ https://www.ncbi.nlm.nih.gov/pubmed/27932907 http://dx.doi.org/10.3897/BDJ.4.e8357 |
work_keys_str_mv | AT varsosconstantinos optimizedrfunctionsforanalysisofecologicalcommunitydatausingthervirtuallaboratoryrvlab AT patkostheodore optimizedrfunctionsforanalysisofecologicalcommunitydatausingthervirtuallaboratoryrvlab AT oulasanastasis optimizedrfunctionsforanalysisofecologicalcommunitydatausingthervirtuallaboratoryrvlab AT pavloudichristina optimizedrfunctionsforanalysisofecologicalcommunitydatausingthervirtuallaboratoryrvlab AT gougousisalexandros optimizedrfunctionsforanalysisofecologicalcommunitydatausingthervirtuallaboratoryrvlab AT ijazumerzeeshan optimizedrfunctionsforanalysisofecologicalcommunitydatausingthervirtuallaboratoryrvlab AT filiopoulouirene optimizedrfunctionsforanalysisofecologicalcommunitydatausingthervirtuallaboratoryrvlab AT pattakosnikolaos optimizedrfunctionsforanalysisofecologicalcommunitydatausingthervirtuallaboratoryrvlab AT vandenbergheedward optimizedrfunctionsforanalysisofecologicalcommunitydatausingthervirtuallaboratoryrvlab AT fernandezguerraantonio optimizedrfunctionsforanalysisofecologicalcommunitydatausingthervirtuallaboratoryrvlab AT faulwettersarah optimizedrfunctionsforanalysisofecologicalcommunitydatausingthervirtuallaboratoryrvlab AT chatzinikolaoueva optimizedrfunctionsforanalysisofecologicalcommunitydatausingthervirtuallaboratoryrvlab AT pafilisevangelos optimizedrfunctionsforanalysisofecologicalcommunitydatausingthervirtuallaboratoryrvlab AT bekiarichryssoula optimizedrfunctionsforanalysisofecologicalcommunitydatausingthervirtuallaboratoryrvlab AT doerrmartin optimizedrfunctionsforanalysisofecologicalcommunitydatausingthervirtuallaboratoryrvlab AT arvanitidischristos optimizedrfunctionsforanalysisofecologicalcommunitydatausingthervirtuallaboratoryrvlab |