Cargando…

MultiDataSet: an R package for encapsulating multiple data sets with application to omic data integration

BACKGROUND: Reduction in the cost of genomic assays has generated large amounts of biomedical-related data. As a result, current studies perform multiple experiments in the same subjects. While Bioconductor’s methods and classes implemented in different packages manage individual experiments, there...

Descripción completa

Detalles Bibliográficos
Autores principales: Hernandez-Ferrer, Carles, Ruiz-Arenas, Carlos, Beltran-Gomila, Alba, González, Juan R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5240259/
https://www.ncbi.nlm.nih.gov/pubmed/28095799
http://dx.doi.org/10.1186/s12859-016-1455-1
_version_ 1782496032822657024
author Hernandez-Ferrer, Carles
Ruiz-Arenas, Carlos
Beltran-Gomila, Alba
González, Juan R.
author_facet Hernandez-Ferrer, Carles
Ruiz-Arenas, Carlos
Beltran-Gomila, Alba
González, Juan R.
author_sort Hernandez-Ferrer, Carles
collection PubMed
description BACKGROUND: Reduction in the cost of genomic assays has generated large amounts of biomedical-related data. As a result, current studies perform multiple experiments in the same subjects. While Bioconductor’s methods and classes implemented in different packages manage individual experiments, there is not a standard class to properly manage different omic datasets from the same subjects. In addition, most R/Bioconductor packages that have been designed to integrate and visualize biological data often use basic data structures with no clear general methods, such as subsetting or selecting samples. RESULTS: To cover this need, we have developed MultiDataSet, a new R class based on Bioconductor standards, designed to encapsulate multiple data sets. MultiDataSet deals with the usual difficulties of managing multiple and non-complete data sets while offering a simple and general way of subsetting features and selecting samples. We illustrate the use of MultiDataSet in three common situations: 1) performing integration analysis with third party packages; 2) creating new methods and functions for omic data integration; 3) encapsulating new unimplemented data from any biological experiment. CONCLUSIONS: MultiDataSet is a suitable class for data integration under R and Bioconductor framework. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1455-1) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5240259
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-52402592017-01-19 MultiDataSet: an R package for encapsulating multiple data sets with application to omic data integration Hernandez-Ferrer, Carles Ruiz-Arenas, Carlos Beltran-Gomila, Alba González, Juan R. BMC Bioinformatics Software BACKGROUND: Reduction in the cost of genomic assays has generated large amounts of biomedical-related data. As a result, current studies perform multiple experiments in the same subjects. While Bioconductor’s methods and classes implemented in different packages manage individual experiments, there is not a standard class to properly manage different omic datasets from the same subjects. In addition, most R/Bioconductor packages that have been designed to integrate and visualize biological data often use basic data structures with no clear general methods, such as subsetting or selecting samples. RESULTS: To cover this need, we have developed MultiDataSet, a new R class based on Bioconductor standards, designed to encapsulate multiple data sets. MultiDataSet deals with the usual difficulties of managing multiple and non-complete data sets while offering a simple and general way of subsetting features and selecting samples. We illustrate the use of MultiDataSet in three common situations: 1) performing integration analysis with third party packages; 2) creating new methods and functions for omic data integration; 3) encapsulating new unimplemented data from any biological experiment. CONCLUSIONS: MultiDataSet is a suitable class for data integration under R and Bioconductor framework. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1455-1) contains supplementary material, which is available to authorized users. BioMed Central 2017-01-17 /pmc/articles/PMC5240259/ /pubmed/28095799 http://dx.doi.org/10.1186/s12859-016-1455-1 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Hernandez-Ferrer, Carles
Ruiz-Arenas, Carlos
Beltran-Gomila, Alba
González, Juan R.
MultiDataSet: an R package for encapsulating multiple data sets with application to omic data integration
title MultiDataSet: an R package for encapsulating multiple data sets with application to omic data integration
title_full MultiDataSet: an R package for encapsulating multiple data sets with application to omic data integration
title_fullStr MultiDataSet: an R package for encapsulating multiple data sets with application to omic data integration
title_full_unstemmed MultiDataSet: an R package for encapsulating multiple data sets with application to omic data integration
title_short MultiDataSet: an R package for encapsulating multiple data sets with application to omic data integration
title_sort multidataset: an r package for encapsulating multiple data sets with application to omic data integration
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5240259/
https://www.ncbi.nlm.nih.gov/pubmed/28095799
http://dx.doi.org/10.1186/s12859-016-1455-1
work_keys_str_mv AT hernandezferrercarles multidatasetanrpackageforencapsulatingmultipledatasetswithapplicationtoomicdataintegration
AT ruizarenascarlos multidatasetanrpackageforencapsulatingmultipledatasetswithapplicationtoomicdataintegration
AT beltrangomilaalba multidatasetanrpackageforencapsulatingmultipledatasetswithapplicationtoomicdataintegration
AT gonzalezjuanr multidatasetanrpackageforencapsulatingmultipledatasetswithapplicationtoomicdataintegration