Cargando…

parallelMCMCcombine: An R Package for Bayesian Methods for Big Data and Analytics

Recent advances in big data and analytics research have provided a wealth of large data sets that are too big to be analyzed in their entirety, due to restrictions on computer memory or storage size. New Bayesian methods have been developed for data sets that are large only due to large sample sizes...

Descripción completa

Detalles Bibliográficos
Autores principales: Miroshnikov, Alexey, Conlon, Erin M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4178156/
https://www.ncbi.nlm.nih.gov/pubmed/25259608
http://dx.doi.org/10.1371/journal.pone.0108425
_version_ 1782336900155047936
author Miroshnikov, Alexey
Conlon, Erin M.
author_facet Miroshnikov, Alexey
Conlon, Erin M.
author_sort Miroshnikov, Alexey
collection PubMed
description Recent advances in big data and analytics research have provided a wealth of large data sets that are too big to be analyzed in their entirety, due to restrictions on computer memory or storage size. New Bayesian methods have been developed for data sets that are large only due to large sample sizes. These methods partition big data sets into subsets and perform independent Bayesian Markov chain Monte Carlo analyses on the subsets. The methods then combine the independent subset posterior samples to estimate a posterior density given the full data set. These approaches were shown to be effective for Bayesian models including logistic regression models, Gaussian mixture models and hierarchical models. Here, we introduce the R package parallelMCMCcombine which carries out four of these techniques for combining independent subset posterior samples. We illustrate each of the methods using a Bayesian logistic regression model for simulation data and a Bayesian Gamma model for real data; we also demonstrate features and capabilities of the R package. The package assumes the user has carried out the Bayesian analysis and has produced the independent subposterior samples outside of the package. The methods are primarily suited to models with unknown parameters of fixed dimension that exist in continuous parameter spaces. We envision this tool will allow researchers to explore the various methods for their specific applications and will assist future progress in this rapidly developing field.
format Online
Article
Text
id pubmed-4178156
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-41781562014-10-02 parallelMCMCcombine: An R Package for Bayesian Methods for Big Data and Analytics Miroshnikov, Alexey Conlon, Erin M. PLoS One Research Article Recent advances in big data and analytics research have provided a wealth of large data sets that are too big to be analyzed in their entirety, due to restrictions on computer memory or storage size. New Bayesian methods have been developed for data sets that are large only due to large sample sizes. These methods partition big data sets into subsets and perform independent Bayesian Markov chain Monte Carlo analyses on the subsets. The methods then combine the independent subset posterior samples to estimate a posterior density given the full data set. These approaches were shown to be effective for Bayesian models including logistic regression models, Gaussian mixture models and hierarchical models. Here, we introduce the R package parallelMCMCcombine which carries out four of these techniques for combining independent subset posterior samples. We illustrate each of the methods using a Bayesian logistic regression model for simulation data and a Bayesian Gamma model for real data; we also demonstrate features and capabilities of the R package. The package assumes the user has carried out the Bayesian analysis and has produced the independent subposterior samples outside of the package. The methods are primarily suited to models with unknown parameters of fixed dimension that exist in continuous parameter spaces. We envision this tool will allow researchers to explore the various methods for their specific applications and will assist future progress in this rapidly developing field. Public Library of Science 2014-09-26 /pmc/articles/PMC4178156/ /pubmed/25259608 http://dx.doi.org/10.1371/journal.pone.0108425 Text en © 2014 Miroshnikov, Conlon http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Miroshnikov, Alexey
Conlon, Erin M.
parallelMCMCcombine: An R Package for Bayesian Methods for Big Data and Analytics
title parallelMCMCcombine: An R Package for Bayesian Methods for Big Data and Analytics
title_full parallelMCMCcombine: An R Package for Bayesian Methods for Big Data and Analytics
title_fullStr parallelMCMCcombine: An R Package for Bayesian Methods for Big Data and Analytics
title_full_unstemmed parallelMCMCcombine: An R Package for Bayesian Methods for Big Data and Analytics
title_short parallelMCMCcombine: An R Package for Bayesian Methods for Big Data and Analytics
title_sort parallelmcmccombine: an r package for bayesian methods for big data and analytics
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4178156/
https://www.ncbi.nlm.nih.gov/pubmed/25259608
http://dx.doi.org/10.1371/journal.pone.0108425
work_keys_str_mv AT miroshnikovalexey parallelmcmccombineanrpackageforbayesianmethodsforbigdataandanalytics
AT conlonerinm parallelmcmccombineanrpackageforbayesianmethodsforbigdataandanalytics