Cargando…

Learning mixed graphical models with separate sparsity parameters and stability-based model selection

BACKGROUND: Mixed graphical models (MGMs) are graphical models learned over a combination of continuous and discrete variables. Mixed variable types are common in biomedical datasets. MGMs consist of a parameterized joint probability density, which implies a network structure over these heterogeneou...

Descripción completa

Detalles Bibliográficos
Autores principales:	Sedgewick, Andrew J., Shi, Ivy, Donovan, Rory M., Benos, Panayiotis V.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2016
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4905606/ https://www.ncbi.nlm.nih.gov/pubmed/27294886 http://dx.doi.org/10.1186/s12859-016-1039-0

_version_	1782437278836064256
author	Sedgewick, Andrew J. Shi, Ivy Donovan, Rory M. Benos, Panayiotis V.
author_facet	Sedgewick, Andrew J. Shi, Ivy Donovan, Rory M. Benos, Panayiotis V.
author_sort	Sedgewick, Andrew J.
collection	PubMed
description	BACKGROUND: Mixed graphical models (MGMs) are graphical models learned over a combination of continuous and discrete variables. Mixed variable types are common in biomedical datasets. MGMs consist of a parameterized joint probability density, which implies a network structure over these heterogeneous variables. The network structure reveals direct associations between the variables and the joint probability density allows one to ask arbitrary probabilistic questions on the data. This information can be used for feature selection, classification and other important tasks. RESULTS: We studied the properties of MGM learning and applications of MGMs to high-dimensional data (biological and simulated). Our results show that MGMs reliably uncover the underlying graph structure, and when used for classification, their performance is comparable to popular discriminative methods (lasso regression and support vector machines). We also show that imposing separate sparsity penalties for edges connecting different types of variables significantly improves edge recovery performance. To choose these sparsity parameters, we propose a new efficient model selection method, named Stable Edge-specific Penalty Selection (StEPS). StEPS is an expansion of an earlier method, StARS, to mixed variable types. In terms of edge recovery, StEPS selected MGMs outperform those models selected using standard techniques, including AIC, BIC and cross-validation. In addition, we use a heuristic search that is linear in size of the sparsity value search space as opposed to the cubic grid search required by other model selection methods. We applied our method to clinical and mRNA expression data from the Lung Genomics Research Consortium (LGRC) and the learned MGM correctly recovered connections between the diagnosis of obstructive or interstitial lung disease, two diagnostic breathing tests, and cigarette smoking history. Our model also suggested biologically relevant mRNA markers that are linked to these three clinical variables. CONCLUSIONS: MGMs are able to accurately recover dependencies between sets of continuous and discrete variables in both simulated and biomedical datasets. Separation of sparsity penalties by edge type is essential for accurate network edge recovery. Furthermore, our stability based method for model selection determines sparsity parameters faster and more accurately (in terms of edge recovery) than other model selection methods. With the ongoing availability of comprehensive clinical and biomedical datasets, MGMs are expected to become a valuable tool for investigating disease mechanisms and answering an array of critical healthcare questions.
format	Online Article Text
id	pubmed-4905606
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-49056062016-06-14 Learning mixed graphical models with separate sparsity parameters and stability-based model selection Sedgewick, Andrew J. Shi, Ivy Donovan, Rory M. Benos, Panayiotis V. BMC Bioinformatics Research BACKGROUND: Mixed graphical models (MGMs) are graphical models learned over a combination of continuous and discrete variables. Mixed variable types are common in biomedical datasets. MGMs consist of a parameterized joint probability density, which implies a network structure over these heterogeneous variables. The network structure reveals direct associations between the variables and the joint probability density allows one to ask arbitrary probabilistic questions on the data. This information can be used for feature selection, classification and other important tasks. RESULTS: We studied the properties of MGM learning and applications of MGMs to high-dimensional data (biological and simulated). Our results show that MGMs reliably uncover the underlying graph structure, and when used for classification, their performance is comparable to popular discriminative methods (lasso regression and support vector machines). We also show that imposing separate sparsity penalties for edges connecting different types of variables significantly improves edge recovery performance. To choose these sparsity parameters, we propose a new efficient model selection method, named Stable Edge-specific Penalty Selection (StEPS). StEPS is an expansion of an earlier method, StARS, to mixed variable types. In terms of edge recovery, StEPS selected MGMs outperform those models selected using standard techniques, including AIC, BIC and cross-validation. In addition, we use a heuristic search that is linear in size of the sparsity value search space as opposed to the cubic grid search required by other model selection methods. We applied our method to clinical and mRNA expression data from the Lung Genomics Research Consortium (LGRC) and the learned MGM correctly recovered connections between the diagnosis of obstructive or interstitial lung disease, two diagnostic breathing tests, and cigarette smoking history. Our model also suggested biologically relevant mRNA markers that are linked to these three clinical variables. CONCLUSIONS: MGMs are able to accurately recover dependencies between sets of continuous and discrete variables in both simulated and biomedical datasets. Separation of sparsity penalties by edge type is essential for accurate network edge recovery. Furthermore, our stability based method for model selection determines sparsity parameters faster and more accurately (in terms of edge recovery) than other model selection methods. With the ongoing availability of comprehensive clinical and biomedical datasets, MGMs are expected to become a valuable tool for investigating disease mechanisms and answering an array of critical healthcare questions. BioMed Central 2016-06-06 /pmc/articles/PMC4905606/ /pubmed/27294886 http://dx.doi.org/10.1186/s12859-016-1039-0 Text en © Sedgewick et al. 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Sedgewick, Andrew J. Shi, Ivy Donovan, Rory M. Benos, Panayiotis V. Learning mixed graphical models with separate sparsity parameters and stability-based model selection
title	Learning mixed graphical models with separate sparsity parameters and stability-based model selection
title_full	Learning mixed graphical models with separate sparsity parameters and stability-based model selection
title_fullStr	Learning mixed graphical models with separate sparsity parameters and stability-based model selection
title_full_unstemmed	Learning mixed graphical models with separate sparsity parameters and stability-based model selection
title_short	Learning mixed graphical models with separate sparsity parameters and stability-based model selection
title_sort	learning mixed graphical models with separate sparsity parameters and stability-based model selection
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4905606/ https://www.ncbi.nlm.nih.gov/pubmed/27294886 http://dx.doi.org/10.1186/s12859-016-1039-0
work_keys_str_mv	AT sedgewickandrewj learningmixedgraphicalmodelswithseparatesparsityparametersandstabilitybasedmodelselection AT shiivy learningmixedgraphicalmodelswithseparatesparsityparametersandstabilitybasedmodelselection AT donovanrorym learningmixedgraphicalmodelswithseparatesparsityparametersandstabilitybasedmodelselection AT benospanayiotisv learningmixedgraphicalmodelswithseparatesparsityparametersandstabilitybasedmodelselection

Learning mixed graphical models with separate sparsity parameters and stability-based model selection

Ejemplares similares