Cargando…

BMRF-MI: integrative identification of protein interaction network by modeling the gene dependency

BACKGROUND: Identification of protein interaction network is a very important step for understanding the molecular mechanisms in cancer. Several methods have been developed to integrate protein-protein interaction (PPI) data with gene expression data for network identification. However, they often f...

Descripción completa

Detalles Bibliográficos
Autores principales: Shi, Xu, Wang, Xiao, Shajahan, Ayesha, Hilakivi-Clarke, Leena, Clarke, Robert, Xuan, Jianhua
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4474537/
https://www.ncbi.nlm.nih.gov/pubmed/26099273
http://dx.doi.org/10.1186/1471-2164-16-S7-S10
_version_ 1782377285532254208
author Shi, Xu
Wang, Xiao
Shajahan, Ayesha
Hilakivi-Clarke, Leena
Clarke, Robert
Xuan, Jianhua
author_facet Shi, Xu
Wang, Xiao
Shajahan, Ayesha
Hilakivi-Clarke, Leena
Clarke, Robert
Xuan, Jianhua
author_sort Shi, Xu
collection PubMed
description BACKGROUND: Identification of protein interaction network is a very important step for understanding the molecular mechanisms in cancer. Several methods have been developed to integrate protein-protein interaction (PPI) data with gene expression data for network identification. However, they often fail to model the dependency between genes in the network, which makes many important genes, especially the upstream genes, unidentified. It is necessary to develop a method to improve the network identification performance by incorporating the dependency between genes. RESULTS: We proposed an approach for identifying protein interaction network by incorporating mutual information (MI) into a Markov random field (MRF) based framework to model the dependency between genes. MI is widely used in information theory to measure the uncertainty between random variables. Different from traditional Pearson correlation test, MI is capable of capturing both linear and non-linear relationship between random variables. Among all the existing MI estimators, we choose to use k-nearest neighbor MI (kNN-MI) estimator which is proved to have minimum bias. The estimated MI is integrated with an MRF framework to model the gene dependency in the context of network. The maximum a posterior (MAP) estimation is applied on the MRF-based model to estimate the network score. In order to reduce the computational complexity of finding the optimal network, a probabilistic searching algorithm is implemented. We further increase the robustness and reproducibility of the results by applying a non-parametric bootstrapping method to measure the confidence level of the identified genes. To evaluate the performance of the proposed method, we test the method on simulation data under different conditions. The experimental results show an improved accuracy in terms of subnetwork identification compared to existing methods. Furthermore, we applied our method onto real breast cancer patient data; the identified protein interaction network shows a close association with the recurrence of breast cancer, which is supported by functional annotation. We also show that the identified subnetworks can be used to predict the recurrence status of cancer patients by survival analysis. CONCLUSIONS: We have developed an integrated approach for protein interaction network identification, which combines Markov random field framework and mutual information to model the gene dependency in PPI network. Improvements in subnetwork identification have been demonstrated with simulation datasets compared to existing methods. We then apply our method onto breast cancer patient data to identify recurrence related subnetworks. The experiment results show that the identified genes are enriched in the pathway and functional categories relevant to progression and recurrence of breast cancer. Finally, the survival analysis based on identified subnetworks achieves a good result of classifying the recurrence status of cancer patients.
format Online
Article
Text
id pubmed-4474537
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-44745372015-06-25 BMRF-MI: integrative identification of protein interaction network by modeling the gene dependency Shi, Xu Wang, Xiao Shajahan, Ayesha Hilakivi-Clarke, Leena Clarke, Robert Xuan, Jianhua BMC Genomics Research BACKGROUND: Identification of protein interaction network is a very important step for understanding the molecular mechanisms in cancer. Several methods have been developed to integrate protein-protein interaction (PPI) data with gene expression data for network identification. However, they often fail to model the dependency between genes in the network, which makes many important genes, especially the upstream genes, unidentified. It is necessary to develop a method to improve the network identification performance by incorporating the dependency between genes. RESULTS: We proposed an approach for identifying protein interaction network by incorporating mutual information (MI) into a Markov random field (MRF) based framework to model the dependency between genes. MI is widely used in information theory to measure the uncertainty between random variables. Different from traditional Pearson correlation test, MI is capable of capturing both linear and non-linear relationship between random variables. Among all the existing MI estimators, we choose to use k-nearest neighbor MI (kNN-MI) estimator which is proved to have minimum bias. The estimated MI is integrated with an MRF framework to model the gene dependency in the context of network. The maximum a posterior (MAP) estimation is applied on the MRF-based model to estimate the network score. In order to reduce the computational complexity of finding the optimal network, a probabilistic searching algorithm is implemented. We further increase the robustness and reproducibility of the results by applying a non-parametric bootstrapping method to measure the confidence level of the identified genes. To evaluate the performance of the proposed method, we test the method on simulation data under different conditions. The experimental results show an improved accuracy in terms of subnetwork identification compared to existing methods. Furthermore, we applied our method onto real breast cancer patient data; the identified protein interaction network shows a close association with the recurrence of breast cancer, which is supported by functional annotation. We also show that the identified subnetworks can be used to predict the recurrence status of cancer patients by survival analysis. CONCLUSIONS: We have developed an integrated approach for protein interaction network identification, which combines Markov random field framework and mutual information to model the gene dependency in PPI network. Improvements in subnetwork identification have been demonstrated with simulation datasets compared to existing methods. We then apply our method onto breast cancer patient data to identify recurrence related subnetworks. The experiment results show that the identified genes are enriched in the pathway and functional categories relevant to progression and recurrence of breast cancer. Finally, the survival analysis based on identified subnetworks achieves a good result of classifying the recurrence status of cancer patients. BioMed Central 2015-06-11 /pmc/articles/PMC4474537/ /pubmed/26099273 http://dx.doi.org/10.1186/1471-2164-16-S7-S10 Text en Copyright © 2015 Shi et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Shi, Xu
Wang, Xiao
Shajahan, Ayesha
Hilakivi-Clarke, Leena
Clarke, Robert
Xuan, Jianhua
BMRF-MI: integrative identification of protein interaction network by modeling the gene dependency
title BMRF-MI: integrative identification of protein interaction network by modeling the gene dependency
title_full BMRF-MI: integrative identification of protein interaction network by modeling the gene dependency
title_fullStr BMRF-MI: integrative identification of protein interaction network by modeling the gene dependency
title_full_unstemmed BMRF-MI: integrative identification of protein interaction network by modeling the gene dependency
title_short BMRF-MI: integrative identification of protein interaction network by modeling the gene dependency
title_sort bmrf-mi: integrative identification of protein interaction network by modeling the gene dependency
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4474537/
https://www.ncbi.nlm.nih.gov/pubmed/26099273
http://dx.doi.org/10.1186/1471-2164-16-S7-S10
work_keys_str_mv AT shixu bmrfmiintegrativeidentificationofproteininteractionnetworkbymodelingthegenedependency
AT wangxiao bmrfmiintegrativeidentificationofproteininteractionnetworkbymodelingthegenedependency
AT shajahanayesha bmrfmiintegrativeidentificationofproteininteractionnetworkbymodelingthegenedependency
AT hilakiviclarkeleena bmrfmiintegrativeidentificationofproteininteractionnetworkbymodelingthegenedependency
AT clarkerobert bmrfmiintegrativeidentificationofproteininteractionnetworkbymodelingthegenedependency
AT xuanjianhua bmrfmiintegrativeidentificationofproteininteractionnetworkbymodelingthegenedependency