Cargando…

A Bayesian Modelling Approach with Balancing Informative Prior for Analysing Imbalanced Data

When a dataset is imbalanced, the prediction of the scarcely-sampled subpopulation can be over-influenced by the population contributing to the majority of the data. The aim of this study was to develop a Bayesian modelling approach with balancing informative prior so that the influence of imbalance...

Descripción completa

Detalles Bibliográficos
Autores principales: Klein, Kerenaftali, Hennig, Stefanie, Paul, Sanjoy Ketan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4829197/
https://www.ncbi.nlm.nih.gov/pubmed/27070549
http://dx.doi.org/10.1371/journal.pone.0152700
_version_ 1782426714214760448
author Klein, Kerenaftali
Hennig, Stefanie
Paul, Sanjoy Ketan
author_facet Klein, Kerenaftali
Hennig, Stefanie
Paul, Sanjoy Ketan
author_sort Klein, Kerenaftali
collection PubMed
description When a dataset is imbalanced, the prediction of the scarcely-sampled subpopulation can be over-influenced by the population contributing to the majority of the data. The aim of this study was to develop a Bayesian modelling approach with balancing informative prior so that the influence of imbalance to the overall prediction could be minimised. The new approach was developed in order to weigh the data in favour of the smaller subset(s). The method was assessed in terms of bias and precision in predicting model parameter estimates of simulated datasets. Moreover, the method was evaluated in predicting optimal dose levels of tobramycin for various age groups in a motivating example. The bias estimates using the balancing informative prior approach were smaller than those generated using the conventional approach which was without the consideration for the imbalance in the datasets. The precision estimates were also superior. The method was further evaluated in a motivating example of optimal dosage prediction of tobramycin. The resulting predictions also agreed well with what had been reported in the literature. The proposed Bayesian balancing informative prior approach has shown a real potential to adequately weigh the data in favour of smaller subset(s) of data to generate robust prediction models.
format Online
Article
Text
id pubmed-4829197
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-48291972016-04-22 A Bayesian Modelling Approach with Balancing Informative Prior for Analysing Imbalanced Data Klein, Kerenaftali Hennig, Stefanie Paul, Sanjoy Ketan PLoS One Research Article When a dataset is imbalanced, the prediction of the scarcely-sampled subpopulation can be over-influenced by the population contributing to the majority of the data. The aim of this study was to develop a Bayesian modelling approach with balancing informative prior so that the influence of imbalance to the overall prediction could be minimised. The new approach was developed in order to weigh the data in favour of the smaller subset(s). The method was assessed in terms of bias and precision in predicting model parameter estimates of simulated datasets. Moreover, the method was evaluated in predicting optimal dose levels of tobramycin for various age groups in a motivating example. The bias estimates using the balancing informative prior approach were smaller than those generated using the conventional approach which was without the consideration for the imbalance in the datasets. The precision estimates were also superior. The method was further evaluated in a motivating example of optimal dosage prediction of tobramycin. The resulting predictions also agreed well with what had been reported in the literature. The proposed Bayesian balancing informative prior approach has shown a real potential to adequately weigh the data in favour of smaller subset(s) of data to generate robust prediction models. Public Library of Science 2016-04-12 /pmc/articles/PMC4829197/ /pubmed/27070549 http://dx.doi.org/10.1371/journal.pone.0152700 Text en © 2016 Klein et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Klein, Kerenaftali
Hennig, Stefanie
Paul, Sanjoy Ketan
A Bayesian Modelling Approach with Balancing Informative Prior for Analysing Imbalanced Data
title A Bayesian Modelling Approach with Balancing Informative Prior for Analysing Imbalanced Data
title_full A Bayesian Modelling Approach with Balancing Informative Prior for Analysing Imbalanced Data
title_fullStr A Bayesian Modelling Approach with Balancing Informative Prior for Analysing Imbalanced Data
title_full_unstemmed A Bayesian Modelling Approach with Balancing Informative Prior for Analysing Imbalanced Data
title_short A Bayesian Modelling Approach with Balancing Informative Prior for Analysing Imbalanced Data
title_sort bayesian modelling approach with balancing informative prior for analysing imbalanced data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4829197/
https://www.ncbi.nlm.nih.gov/pubmed/27070549
http://dx.doi.org/10.1371/journal.pone.0152700
work_keys_str_mv AT kleinkerenaftali abayesianmodellingapproachwithbalancinginformativepriorforanalysingimbalanceddata
AT hennigstefanie abayesianmodellingapproachwithbalancinginformativepriorforanalysingimbalanceddata
AT paulsanjoyketan abayesianmodellingapproachwithbalancinginformativepriorforanalysingimbalanceddata
AT kleinkerenaftali bayesianmodellingapproachwithbalancinginformativepriorforanalysingimbalanceddata
AT hennigstefanie bayesianmodellingapproachwithbalancinginformativepriorforanalysingimbalanceddata
AT paulsanjoyketan bayesianmodellingapproachwithbalancinginformativepriorforanalysingimbalanceddata