Cargando…
A mixture copula Bayesian network model for multimodal genomic data
Gaussian Bayesian networks have become a widely used framework to estimate directed associations between joint Gaussian variables, where the network structure encodes the decomposition of multivariate normal density into local terms. However, the resulting estimates can be inaccurate when the normal...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
SAGE Publications
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5397279/ https://www.ncbi.nlm.nih.gov/pubmed/28469391 http://dx.doi.org/10.1177/1176935117702389 |
_version_ | 1783230231762436096 |
---|---|
author | Zhang, Qingyang Shi, Xuan |
author_facet | Zhang, Qingyang Shi, Xuan |
author_sort | Zhang, Qingyang |
collection | PubMed |
description | Gaussian Bayesian networks have become a widely used framework to estimate directed associations between joint Gaussian variables, where the network structure encodes the decomposition of multivariate normal density into local terms. However, the resulting estimates can be inaccurate when the normality assumption is moderately or severely violated, making it unsuitable for dealing with recent genomic data such as the Cancer Genome Atlas data. In the present paper, we propose a mixture copula Bayesian network model which provides great flexibility in modeling non-Gaussian and multimodal data for causal inference. The parameters in mixture copula functions can be efficiently estimated by a routine expectation–maximization algorithm. A heuristic search algorithm based on Bayesian information criterion is developed to estimate the network structure, and prediction can be further improved by the best-scoring network out of multiple predictions from random initial values. Our method outperforms Gaussian Bayesian networks and regular copula Bayesian networks in terms of modeling flexibility and prediction accuracy, as demonstrated using a cell signaling data set. We apply the proposed methods to the Cancer Genome Atlas data to study the genetic and epigenetic pathways that underlie serous ovarian cancer. |
format | Online Article Text |
id | pubmed-5397279 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | SAGE Publications |
record_format | MEDLINE/PubMed |
spelling | pubmed-53972792017-05-03 A mixture copula Bayesian network model for multimodal genomic data Zhang, Qingyang Shi, Xuan Cancer Inform Methodology Gaussian Bayesian networks have become a widely used framework to estimate directed associations between joint Gaussian variables, where the network structure encodes the decomposition of multivariate normal density into local terms. However, the resulting estimates can be inaccurate when the normality assumption is moderately or severely violated, making it unsuitable for dealing with recent genomic data such as the Cancer Genome Atlas data. In the present paper, we propose a mixture copula Bayesian network model which provides great flexibility in modeling non-Gaussian and multimodal data for causal inference. The parameters in mixture copula functions can be efficiently estimated by a routine expectation–maximization algorithm. A heuristic search algorithm based on Bayesian information criterion is developed to estimate the network structure, and prediction can be further improved by the best-scoring network out of multiple predictions from random initial values. Our method outperforms Gaussian Bayesian networks and regular copula Bayesian networks in terms of modeling flexibility and prediction accuracy, as demonstrated using a cell signaling data set. We apply the proposed methods to the Cancer Genome Atlas data to study the genetic and epigenetic pathways that underlie serous ovarian cancer. SAGE Publications 2017-04-12 /pmc/articles/PMC5397279/ /pubmed/28469391 http://dx.doi.org/10.1177/1176935117702389 Text en © The Author(s) 2017 This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 License (http://www.creativecommons.org/licenses/by-nc/4.0/) which permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access page(https://us.sagepub.com/en-us/nam/open-access-at-sage). |
spellingShingle | Methodology Zhang, Qingyang Shi, Xuan A mixture copula Bayesian network model for multimodal genomic data |
title | A mixture copula Bayesian network model for multimodal genomic data |
title_full | A mixture copula Bayesian network model for multimodal genomic data |
title_fullStr | A mixture copula Bayesian network model for multimodal genomic data |
title_full_unstemmed | A mixture copula Bayesian network model for multimodal genomic data |
title_short | A mixture copula Bayesian network model for multimodal genomic data |
title_sort | mixture copula bayesian network model for multimodal genomic data |
topic | Methodology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5397279/ https://www.ncbi.nlm.nih.gov/pubmed/28469391 http://dx.doi.org/10.1177/1176935117702389 |
work_keys_str_mv | AT zhangqingyang amixturecopulabayesiannetworkmodelformultimodalgenomicdata AT shixuan amixturecopulabayesiannetworkmodelformultimodalgenomicdata AT zhangqingyang mixturecopulabayesiannetworkmodelformultimodalgenomicdata AT shixuan mixturecopulabayesiannetworkmodelformultimodalgenomicdata |