Cargando…

Model averaging strategies for structure learning in Bayesian networks with limited data

BACKGROUND: Considerable progress has been made on algorithms for learning the structure of Bayesian networks from data. Model averaging by using bootstrap replicates with feature selection by thresholding is a widely used solution for learning features with high confidence. Yet, in the context of l...

Descripción completa

Detalles Bibliográficos
Autores principales: Broom, Bradley M, Do, Kim-Anh, Subramanian, Devika
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3426799/
https://www.ncbi.nlm.nih.gov/pubmed/23320818
http://dx.doi.org/10.1186/1471-2105-13-S13-S10
_version_ 1782241544871346176
author Broom, Bradley M
Do, Kim-Anh
Subramanian, Devika
author_facet Broom, Bradley M
Do, Kim-Anh
Subramanian, Devika
author_sort Broom, Bradley M
collection PubMed
description BACKGROUND: Considerable progress has been made on algorithms for learning the structure of Bayesian networks from data. Model averaging by using bootstrap replicates with feature selection by thresholding is a widely used solution for learning features with high confidence. Yet, in the context of limited data many questions remain unanswered. What scoring functions are most effective for model averaging? Does the bias arising from the discreteness of the bootstrap significantly affect learning performance? Is it better to pick the single best network or to average multiple networks learnt from each bootstrap resample? How should thresholds for learning statistically significant features be selected? RESULTS: The best scoring functions are Dirichlet Prior Scoring Metric with small λ and the Bayesian Dirichlet metric. Correcting the bias arising from the discreteness of the bootstrap worsens learning performance. It is better to pick the single best network learnt from each bootstrap resample. We describe a permutation based method for determining significance thresholds for feature selection in bagged models. We show that in contexts with limited data, Bayesian bagging using the Dirichlet Prior Scoring Metric (DPSM) is the most effective learning strategy, and that modifying the scoring function to penalize complex networks hampers model averaging. We establish these results using a systematic study of two well-known benchmarks, specifically ALARM and INSURANCE. We also apply our network construction method to gene expression data from the Cancer Genome Atlas Glioblastoma multiforme dataset and show that survival is related to clinical covariates age and gender and clusters for interferon induced genes and growth inhibition genes. CONCLUSIONS: For small data sets, our approach performs significantly better than previously published methods.
format Online
Article
Text
id pubmed-3426799
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-34267992012-08-24 Model averaging strategies for structure learning in Bayesian networks with limited data Broom, Bradley M Do, Kim-Anh Subramanian, Devika BMC Bioinformatics Research BACKGROUND: Considerable progress has been made on algorithms for learning the structure of Bayesian networks from data. Model averaging by using bootstrap replicates with feature selection by thresholding is a widely used solution for learning features with high confidence. Yet, in the context of limited data many questions remain unanswered. What scoring functions are most effective for model averaging? Does the bias arising from the discreteness of the bootstrap significantly affect learning performance? Is it better to pick the single best network or to average multiple networks learnt from each bootstrap resample? How should thresholds for learning statistically significant features be selected? RESULTS: The best scoring functions are Dirichlet Prior Scoring Metric with small λ and the Bayesian Dirichlet metric. Correcting the bias arising from the discreteness of the bootstrap worsens learning performance. It is better to pick the single best network learnt from each bootstrap resample. We describe a permutation based method for determining significance thresholds for feature selection in bagged models. We show that in contexts with limited data, Bayesian bagging using the Dirichlet Prior Scoring Metric (DPSM) is the most effective learning strategy, and that modifying the scoring function to penalize complex networks hampers model averaging. We establish these results using a systematic study of two well-known benchmarks, specifically ALARM and INSURANCE. We also apply our network construction method to gene expression data from the Cancer Genome Atlas Glioblastoma multiforme dataset and show that survival is related to clinical covariates age and gender and clusters for interferon induced genes and growth inhibition genes. CONCLUSIONS: For small data sets, our approach performs significantly better than previously published methods. BioMed Central 2012-08-24 /pmc/articles/PMC3426799/ /pubmed/23320818 http://dx.doi.org/10.1186/1471-2105-13-S13-S10 Text en Copyright ©2012 Broom et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Broom, Bradley M
Do, Kim-Anh
Subramanian, Devika
Model averaging strategies for structure learning in Bayesian networks with limited data
title Model averaging strategies for structure learning in Bayesian networks with limited data
title_full Model averaging strategies for structure learning in Bayesian networks with limited data
title_fullStr Model averaging strategies for structure learning in Bayesian networks with limited data
title_full_unstemmed Model averaging strategies for structure learning in Bayesian networks with limited data
title_short Model averaging strategies for structure learning in Bayesian networks with limited data
title_sort model averaging strategies for structure learning in bayesian networks with limited data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3426799/
https://www.ncbi.nlm.nih.gov/pubmed/23320818
http://dx.doi.org/10.1186/1471-2105-13-S13-S10
work_keys_str_mv AT broombradleym modelaveragingstrategiesforstructurelearninginbayesiannetworkswithlimiteddata
AT dokimanh modelaveragingstrategiesforstructurelearninginbayesiannetworkswithlimiteddata
AT subramaniandevika modelaveragingstrategiesforstructurelearninginbayesiannetworkswithlimiteddata