Cargando…

A mixture copula Bayesian network model for multimodal genomic data

Gaussian Bayesian networks have become a widely used framework to estimate directed associations between joint Gaussian variables, where the network structure encodes the decomposition of multivariate normal density into local terms. However, the resulting estimates can be inaccurate when the normal...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Qingyang, Shi, Xuan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: SAGE Publications 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5397279/
https://www.ncbi.nlm.nih.gov/pubmed/28469391
http://dx.doi.org/10.1177/1176935117702389
_version_ 1783230231762436096
author Zhang, Qingyang
Shi, Xuan
author_facet Zhang, Qingyang
Shi, Xuan
author_sort Zhang, Qingyang
collection PubMed
description Gaussian Bayesian networks have become a widely used framework to estimate directed associations between joint Gaussian variables, where the network structure encodes the decomposition of multivariate normal density into local terms. However, the resulting estimates can be inaccurate when the normality assumption is moderately or severely violated, making it unsuitable for dealing with recent genomic data such as the Cancer Genome Atlas data. In the present paper, we propose a mixture copula Bayesian network model which provides great flexibility in modeling non-Gaussian and multimodal data for causal inference. The parameters in mixture copula functions can be efficiently estimated by a routine expectation–maximization algorithm. A heuristic search algorithm based on Bayesian information criterion is developed to estimate the network structure, and prediction can be further improved by the best-scoring network out of multiple predictions from random initial values. Our method outperforms Gaussian Bayesian networks and regular copula Bayesian networks in terms of modeling flexibility and prediction accuracy, as demonstrated using a cell signaling data set. We apply the proposed methods to the Cancer Genome Atlas data to study the genetic and epigenetic pathways that underlie serous ovarian cancer.
format Online
Article
Text
id pubmed-5397279
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher SAGE Publications
record_format MEDLINE/PubMed
spelling pubmed-53972792017-05-03 A mixture copula Bayesian network model for multimodal genomic data Zhang, Qingyang Shi, Xuan Cancer Inform Methodology Gaussian Bayesian networks have become a widely used framework to estimate directed associations between joint Gaussian variables, where the network structure encodes the decomposition of multivariate normal density into local terms. However, the resulting estimates can be inaccurate when the normality assumption is moderately or severely violated, making it unsuitable for dealing with recent genomic data such as the Cancer Genome Atlas data. In the present paper, we propose a mixture copula Bayesian network model which provides great flexibility in modeling non-Gaussian and multimodal data for causal inference. The parameters in mixture copula functions can be efficiently estimated by a routine expectation–maximization algorithm. A heuristic search algorithm based on Bayesian information criterion is developed to estimate the network structure, and prediction can be further improved by the best-scoring network out of multiple predictions from random initial values. Our method outperforms Gaussian Bayesian networks and regular copula Bayesian networks in terms of modeling flexibility and prediction accuracy, as demonstrated using a cell signaling data set. We apply the proposed methods to the Cancer Genome Atlas data to study the genetic and epigenetic pathways that underlie serous ovarian cancer. SAGE Publications 2017-04-12 /pmc/articles/PMC5397279/ /pubmed/28469391 http://dx.doi.org/10.1177/1176935117702389 Text en © The Author(s) 2017 This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 License (http://www.creativecommons.org/licenses/by-nc/4.0/) which permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access page(https://us.sagepub.com/en-us/nam/open-access-at-sage).
spellingShingle Methodology
Zhang, Qingyang
Shi, Xuan
A mixture copula Bayesian network model for multimodal genomic data
title A mixture copula Bayesian network model for multimodal genomic data
title_full A mixture copula Bayesian network model for multimodal genomic data
title_fullStr A mixture copula Bayesian network model for multimodal genomic data
title_full_unstemmed A mixture copula Bayesian network model for multimodal genomic data
title_short A mixture copula Bayesian network model for multimodal genomic data
title_sort mixture copula bayesian network model for multimodal genomic data
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5397279/
https://www.ncbi.nlm.nih.gov/pubmed/28469391
http://dx.doi.org/10.1177/1176935117702389
work_keys_str_mv AT zhangqingyang amixturecopulabayesiannetworkmodelformultimodalgenomicdata
AT shixuan amixturecopulabayesiannetworkmodelformultimodalgenomicdata
AT zhangqingyang mixturecopulabayesiannetworkmodelformultimodalgenomicdata
AT shixuan mixturecopulabayesiannetworkmodelformultimodalgenomicdata