Cargando…
Selecting high-dimensional mixed graphical models using minimal AIC or BIC forests
BACKGROUND: Chow and Liu showed that the maximum likelihood tree for multivariate discrete distributions may be found using a maximum weight spanning tree algorithm, for example Kruskal's algorithm. The efficiency of the algorithm makes it tractable for high-dimensional problems. RESULTS: We ex...
Autores principales: | , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2010
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2823705/ https://www.ncbi.nlm.nih.gov/pubmed/20064242 http://dx.doi.org/10.1186/1471-2105-11-18 |
_version_ | 1782177667957653504 |
---|---|
author | Edwards, David de Abreu, Gabriel CG Labouriau, Rodrigo |
author_facet | Edwards, David de Abreu, Gabriel CG Labouriau, Rodrigo |
author_sort | Edwards, David |
collection | PubMed |
description | BACKGROUND: Chow and Liu showed that the maximum likelihood tree for multivariate discrete distributions may be found using a maximum weight spanning tree algorithm, for example Kruskal's algorithm. The efficiency of the algorithm makes it tractable for high-dimensional problems. RESULTS: We extend Chow and Liu's approach in two ways: first, to find the forest optimizing a penalized likelihood criterion, for example AIC or BIC, and second, to handle data with both discrete and Gaussian variables. We apply the approach to three datasets: two from gene expression studies and the third from a genetics of gene expression study. The minimal BIC forest supplements a conventional analysis of differential expression by providing a tentative network for the differentially expressed genes. In the genetics of gene expression context the method identifies a network approximating the joint distribution of the DNA markers and the gene expression levels. CONCLUSIONS: The approach is generally useful as a preliminary step towards understanding the overall dependence structure of high-dimensional discrete and/or continuous data. Trees and forests are unrealistically simple models for biological systems, but can provide useful insights. Uses include the following: identification of distinct connected components, which can be analysed separately (dimension reduction); identification of neighbourhoods for more detailed analyses; as initial models for search algorithms with a larger search space, for example decomposable models or Bayesian networks; and identification of interesting features, such as hub nodes. |
format | Text |
id | pubmed-2823705 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2010 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-28237052010-02-18 Selecting high-dimensional mixed graphical models using minimal AIC or BIC forests Edwards, David de Abreu, Gabriel CG Labouriau, Rodrigo BMC Bioinformatics Methodology article BACKGROUND: Chow and Liu showed that the maximum likelihood tree for multivariate discrete distributions may be found using a maximum weight spanning tree algorithm, for example Kruskal's algorithm. The efficiency of the algorithm makes it tractable for high-dimensional problems. RESULTS: We extend Chow and Liu's approach in two ways: first, to find the forest optimizing a penalized likelihood criterion, for example AIC or BIC, and second, to handle data with both discrete and Gaussian variables. We apply the approach to three datasets: two from gene expression studies and the third from a genetics of gene expression study. The minimal BIC forest supplements a conventional analysis of differential expression by providing a tentative network for the differentially expressed genes. In the genetics of gene expression context the method identifies a network approximating the joint distribution of the DNA markers and the gene expression levels. CONCLUSIONS: The approach is generally useful as a preliminary step towards understanding the overall dependence structure of high-dimensional discrete and/or continuous data. Trees and forests are unrealistically simple models for biological systems, but can provide useful insights. Uses include the following: identification of distinct connected components, which can be analysed separately (dimension reduction); identification of neighbourhoods for more detailed analyses; as initial models for search algorithms with a larger search space, for example decomposable models or Bayesian networks; and identification of interesting features, such as hub nodes. BioMed Central 2010-01-11 /pmc/articles/PMC2823705/ /pubmed/20064242 http://dx.doi.org/10.1186/1471-2105-11-18 Text en Copyright ©2010 Edwards et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methodology article Edwards, David de Abreu, Gabriel CG Labouriau, Rodrigo Selecting high-dimensional mixed graphical models using minimal AIC or BIC forests |
title | Selecting high-dimensional mixed graphical models using minimal AIC or BIC forests |
title_full | Selecting high-dimensional mixed graphical models using minimal AIC or BIC forests |
title_fullStr | Selecting high-dimensional mixed graphical models using minimal AIC or BIC forests |
title_full_unstemmed | Selecting high-dimensional mixed graphical models using minimal AIC or BIC forests |
title_short | Selecting high-dimensional mixed graphical models using minimal AIC or BIC forests |
title_sort | selecting high-dimensional mixed graphical models using minimal aic or bic forests |
topic | Methodology article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2823705/ https://www.ncbi.nlm.nih.gov/pubmed/20064242 http://dx.doi.org/10.1186/1471-2105-11-18 |
work_keys_str_mv | AT edwardsdavid selectinghighdimensionalmixedgraphicalmodelsusingminimalaicorbicforests AT deabreugabrielcg selectinghighdimensionalmixedgraphicalmodelsusingminimalaicorbicforests AT labouriaurodrigo selectinghighdimensionalmixedgraphicalmodelsusingminimalaicorbicforests |