Cargando…
MultiDataSet: an R package for encapsulating multiple data sets with application to omic data integration
BACKGROUND: Reduction in the cost of genomic assays has generated large amounts of biomedical-related data. As a result, current studies perform multiple experiments in the same subjects. While Bioconductor’s methods and classes implemented in different packages manage individual experiments, there...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5240259/ https://www.ncbi.nlm.nih.gov/pubmed/28095799 http://dx.doi.org/10.1186/s12859-016-1455-1 |
_version_ | 1782496032822657024 |
---|---|
author | Hernandez-Ferrer, Carles Ruiz-Arenas, Carlos Beltran-Gomila, Alba González, Juan R. |
author_facet | Hernandez-Ferrer, Carles Ruiz-Arenas, Carlos Beltran-Gomila, Alba González, Juan R. |
author_sort | Hernandez-Ferrer, Carles |
collection | PubMed |
description | BACKGROUND: Reduction in the cost of genomic assays has generated large amounts of biomedical-related data. As a result, current studies perform multiple experiments in the same subjects. While Bioconductor’s methods and classes implemented in different packages manage individual experiments, there is not a standard class to properly manage different omic datasets from the same subjects. In addition, most R/Bioconductor packages that have been designed to integrate and visualize biological data often use basic data structures with no clear general methods, such as subsetting or selecting samples. RESULTS: To cover this need, we have developed MultiDataSet, a new R class based on Bioconductor standards, designed to encapsulate multiple data sets. MultiDataSet deals with the usual difficulties of managing multiple and non-complete data sets while offering a simple and general way of subsetting features and selecting samples. We illustrate the use of MultiDataSet in three common situations: 1) performing integration analysis with third party packages; 2) creating new methods and functions for omic data integration; 3) encapsulating new unimplemented data from any biological experiment. CONCLUSIONS: MultiDataSet is a suitable class for data integration under R and Bioconductor framework. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1455-1) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5240259 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-52402592017-01-19 MultiDataSet: an R package for encapsulating multiple data sets with application to omic data integration Hernandez-Ferrer, Carles Ruiz-Arenas, Carlos Beltran-Gomila, Alba González, Juan R. BMC Bioinformatics Software BACKGROUND: Reduction in the cost of genomic assays has generated large amounts of biomedical-related data. As a result, current studies perform multiple experiments in the same subjects. While Bioconductor’s methods and classes implemented in different packages manage individual experiments, there is not a standard class to properly manage different omic datasets from the same subjects. In addition, most R/Bioconductor packages that have been designed to integrate and visualize biological data often use basic data structures with no clear general methods, such as subsetting or selecting samples. RESULTS: To cover this need, we have developed MultiDataSet, a new R class based on Bioconductor standards, designed to encapsulate multiple data sets. MultiDataSet deals with the usual difficulties of managing multiple and non-complete data sets while offering a simple and general way of subsetting features and selecting samples. We illustrate the use of MultiDataSet in three common situations: 1) performing integration analysis with third party packages; 2) creating new methods and functions for omic data integration; 3) encapsulating new unimplemented data from any biological experiment. CONCLUSIONS: MultiDataSet is a suitable class for data integration under R and Bioconductor framework. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-1455-1) contains supplementary material, which is available to authorized users. BioMed Central 2017-01-17 /pmc/articles/PMC5240259/ /pubmed/28095799 http://dx.doi.org/10.1186/s12859-016-1455-1 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Software Hernandez-Ferrer, Carles Ruiz-Arenas, Carlos Beltran-Gomila, Alba González, Juan R. MultiDataSet: an R package for encapsulating multiple data sets with application to omic data integration |
title | MultiDataSet: an R package for encapsulating multiple data sets with application to omic data integration |
title_full | MultiDataSet: an R package for encapsulating multiple data sets with application to omic data integration |
title_fullStr | MultiDataSet: an R package for encapsulating multiple data sets with application to omic data integration |
title_full_unstemmed | MultiDataSet: an R package for encapsulating multiple data sets with application to omic data integration |
title_short | MultiDataSet: an R package for encapsulating multiple data sets with application to omic data integration |
title_sort | multidataset: an r package for encapsulating multiple data sets with application to omic data integration |
topic | Software |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5240259/ https://www.ncbi.nlm.nih.gov/pubmed/28095799 http://dx.doi.org/10.1186/s12859-016-1455-1 |
work_keys_str_mv | AT hernandezferrercarles multidatasetanrpackageforencapsulatingmultipledatasetswithapplicationtoomicdataintegration AT ruizarenascarlos multidatasetanrpackageforencapsulatingmultipledatasetswithapplicationtoomicdataintegration AT beltrangomilaalba multidatasetanrpackageforencapsulatingmultipledatasetswithapplicationtoomicdataintegration AT gonzalezjuanr multidatasetanrpackageforencapsulatingmultipledatasetswithapplicationtoomicdataintegration |