Cargando…

Graphical modeling of binary data using the LASSO: a simulation study

BACKGROUND: Graphical models were identified as a promising new approach to modeling high-dimensional clinical data. They provided a probabilistic tool to display, analyze and visualize the net-like dependence structures by drawing a graph describing the conditional dependencies between the variable...

Descripción completa

Detalles Bibliográficos
Autores principales: Strobl, Ralf, Grill, Eva, Mansmann, Ulrich
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3305667/
https://www.ncbi.nlm.nih.gov/pubmed/22353192
http://dx.doi.org/10.1186/1471-2288-12-16
_version_ 1782227120883236864
author Strobl, Ralf
Grill, Eva
Mansmann, Ulrich
author_facet Strobl, Ralf
Grill, Eva
Mansmann, Ulrich
author_sort Strobl, Ralf
collection PubMed
description BACKGROUND: Graphical models were identified as a promising new approach to modeling high-dimensional clinical data. They provided a probabilistic tool to display, analyze and visualize the net-like dependence structures by drawing a graph describing the conditional dependencies between the variables. Until now, the main focus of research was on building Gaussian graphical models for continuous multivariate data following a multivariate normal distribution. Satisfactory solutions for binary data were missing. We adapted the method of Meinshausen and Bühlmann to binary data and used the LASSO for logistic regression. Objective of this paper was to examine the performance of the Bolasso to the development of graphical models for high dimensional binary data. We hypothesized that the performance of Bolasso is superior to competing LASSO methods to identify graphical models. METHODS: We analyzed the Bolasso to derive graphical models in comparison with other LASSO based method. Model performance was assessed in a simulation study with random data generated via symmetric local logistic regression models and Gibbs sampling. Main outcome variables were the Structural Hamming Distance and the Youden Index. We applied the results of the simulation study to a real-life data with functioning data of patients having head and neck cancer. RESULTS: Bootstrap aggregating as incorporated in the Bolasso algorithm greatly improved the performance in higher sample sizes. The number of bootstraps did have minimal impact on performance. Bolasso performed reasonable well with a cutpoint of 0.90 and a small penalty term. Optimal prediction for Bolasso leads to very conservative models in comparison with AIC, BIC or cross-validated optimal penalty terms. CONCLUSIONS: Bootstrap aggregating may improve variable selection if the underlying selection process is not too unstable due to small sample size and if one is mainly interested in reducing the false discovery rate. We propose using the Bolasso for graphical modeling in large sample sizes.
format Online
Article
Text
id pubmed-3305667
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-33056672012-03-16 Graphical modeling of binary data using the LASSO: a simulation study Strobl, Ralf Grill, Eva Mansmann, Ulrich BMC Med Res Methodol Research Article BACKGROUND: Graphical models were identified as a promising new approach to modeling high-dimensional clinical data. They provided a probabilistic tool to display, analyze and visualize the net-like dependence structures by drawing a graph describing the conditional dependencies between the variables. Until now, the main focus of research was on building Gaussian graphical models for continuous multivariate data following a multivariate normal distribution. Satisfactory solutions for binary data were missing. We adapted the method of Meinshausen and Bühlmann to binary data and used the LASSO for logistic regression. Objective of this paper was to examine the performance of the Bolasso to the development of graphical models for high dimensional binary data. We hypothesized that the performance of Bolasso is superior to competing LASSO methods to identify graphical models. METHODS: We analyzed the Bolasso to derive graphical models in comparison with other LASSO based method. Model performance was assessed in a simulation study with random data generated via symmetric local logistic regression models and Gibbs sampling. Main outcome variables were the Structural Hamming Distance and the Youden Index. We applied the results of the simulation study to a real-life data with functioning data of patients having head and neck cancer. RESULTS: Bootstrap aggregating as incorporated in the Bolasso algorithm greatly improved the performance in higher sample sizes. The number of bootstraps did have minimal impact on performance. Bolasso performed reasonable well with a cutpoint of 0.90 and a small penalty term. Optimal prediction for Bolasso leads to very conservative models in comparison with AIC, BIC or cross-validated optimal penalty terms. CONCLUSIONS: Bootstrap aggregating may improve variable selection if the underlying selection process is not too unstable due to small sample size and if one is mainly interested in reducing the false discovery rate. We propose using the Bolasso for graphical modeling in large sample sizes. BioMed Central 2012-02-21 /pmc/articles/PMC3305667/ /pubmed/22353192 http://dx.doi.org/10.1186/1471-2288-12-16 Text en Copyright ©2012 Strobl et al; BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Strobl, Ralf
Grill, Eva
Mansmann, Ulrich
Graphical modeling of binary data using the LASSO: a simulation study
title Graphical modeling of binary data using the LASSO: a simulation study
title_full Graphical modeling of binary data using the LASSO: a simulation study
title_fullStr Graphical modeling of binary data using the LASSO: a simulation study
title_full_unstemmed Graphical modeling of binary data using the LASSO: a simulation study
title_short Graphical modeling of binary data using the LASSO: a simulation study
title_sort graphical modeling of binary data using the lasso: a simulation study
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3305667/
https://www.ncbi.nlm.nih.gov/pubmed/22353192
http://dx.doi.org/10.1186/1471-2288-12-16
work_keys_str_mv AT stroblralf graphicalmodelingofbinarydatausingthelassoasimulationstudy
AT grilleva graphicalmodelingofbinarydatausingthelassoasimulationstudy
AT mansmannulrich graphicalmodelingofbinarydatausingthelassoasimulationstudy