Cargando…

Selecting high-dimensional mixed graphical models using minimal AIC or BIC forests

BACKGROUND: Chow and Liu showed that the maximum likelihood tree for multivariate discrete distributions may be found using a maximum weight spanning tree algorithm, for example Kruskal's algorithm. The efficiency of the algorithm makes it tractable for high-dimensional problems. RESULTS: We ex...

Descripción completa

Detalles Bibliográficos
Autores principales: Edwards, David, de Abreu, Gabriel CG, Labouriau, Rodrigo
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2823705/
https://www.ncbi.nlm.nih.gov/pubmed/20064242
http://dx.doi.org/10.1186/1471-2105-11-18
_version_ 1782177667957653504
author Edwards, David
de Abreu, Gabriel CG
Labouriau, Rodrigo
author_facet Edwards, David
de Abreu, Gabriel CG
Labouriau, Rodrigo
author_sort Edwards, David
collection PubMed
description BACKGROUND: Chow and Liu showed that the maximum likelihood tree for multivariate discrete distributions may be found using a maximum weight spanning tree algorithm, for example Kruskal's algorithm. The efficiency of the algorithm makes it tractable for high-dimensional problems. RESULTS: We extend Chow and Liu's approach in two ways: first, to find the forest optimizing a penalized likelihood criterion, for example AIC or BIC, and second, to handle data with both discrete and Gaussian variables. We apply the approach to three datasets: two from gene expression studies and the third from a genetics of gene expression study. The minimal BIC forest supplements a conventional analysis of differential expression by providing a tentative network for the differentially expressed genes. In the genetics of gene expression context the method identifies a network approximating the joint distribution of the DNA markers and the gene expression levels. CONCLUSIONS: The approach is generally useful as a preliminary step towards understanding the overall dependence structure of high-dimensional discrete and/or continuous data. Trees and forests are unrealistically simple models for biological systems, but can provide useful insights. Uses include the following: identification of distinct connected components, which can be analysed separately (dimension reduction); identification of neighbourhoods for more detailed analyses; as initial models for search algorithms with a larger search space, for example decomposable models or Bayesian networks; and identification of interesting features, such as hub nodes.
format Text
id pubmed-2823705
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-28237052010-02-18 Selecting high-dimensional mixed graphical models using minimal AIC or BIC forests Edwards, David de Abreu, Gabriel CG Labouriau, Rodrigo BMC Bioinformatics Methodology article BACKGROUND: Chow and Liu showed that the maximum likelihood tree for multivariate discrete distributions may be found using a maximum weight spanning tree algorithm, for example Kruskal's algorithm. The efficiency of the algorithm makes it tractable for high-dimensional problems. RESULTS: We extend Chow and Liu's approach in two ways: first, to find the forest optimizing a penalized likelihood criterion, for example AIC or BIC, and second, to handle data with both discrete and Gaussian variables. We apply the approach to three datasets: two from gene expression studies and the third from a genetics of gene expression study. The minimal BIC forest supplements a conventional analysis of differential expression by providing a tentative network for the differentially expressed genes. In the genetics of gene expression context the method identifies a network approximating the joint distribution of the DNA markers and the gene expression levels. CONCLUSIONS: The approach is generally useful as a preliminary step towards understanding the overall dependence structure of high-dimensional discrete and/or continuous data. Trees and forests are unrealistically simple models for biological systems, but can provide useful insights. Uses include the following: identification of distinct connected components, which can be analysed separately (dimension reduction); identification of neighbourhoods for more detailed analyses; as initial models for search algorithms with a larger search space, for example decomposable models or Bayesian networks; and identification of interesting features, such as hub nodes. BioMed Central 2010-01-11 /pmc/articles/PMC2823705/ /pubmed/20064242 http://dx.doi.org/10.1186/1471-2105-11-18 Text en Copyright ©2010 Edwards et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology article
Edwards, David
de Abreu, Gabriel CG
Labouriau, Rodrigo
Selecting high-dimensional mixed graphical models using minimal AIC or BIC forests
title Selecting high-dimensional mixed graphical models using minimal AIC or BIC forests
title_full Selecting high-dimensional mixed graphical models using minimal AIC or BIC forests
title_fullStr Selecting high-dimensional mixed graphical models using minimal AIC or BIC forests
title_full_unstemmed Selecting high-dimensional mixed graphical models using minimal AIC or BIC forests
title_short Selecting high-dimensional mixed graphical models using minimal AIC or BIC forests
title_sort selecting high-dimensional mixed graphical models using minimal aic or bic forests
topic Methodology article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2823705/
https://www.ncbi.nlm.nih.gov/pubmed/20064242
http://dx.doi.org/10.1186/1471-2105-11-18
work_keys_str_mv AT edwardsdavid selectinghighdimensionalmixedgraphicalmodelsusingminimalaicorbicforests
AT deabreugabrielcg selectinghighdimensionalmixedgraphicalmodelsusingminimalaicorbicforests
AT labouriaurodrigo selectinghighdimensionalmixedgraphicalmodelsusingminimalaicorbicforests