Cargando…

Incorporating prior biological knowledge for network-based differential gene expression analysis using differentially weighted graphical LASSO

BACKGROUND: Conventional differential gene expression analysis by methods such as student’s t-test, SAM, and Empirical Bayes often searches for statistically significant genes without considering the interactions among them. Network-based approaches provide a natural way to study these interactions...

Descripción completa

Detalles Bibliográficos
Autores principales: Zuo, Yiming, Cui, Yi, Yu, Guoqiang, Li, Ruijiang, Ressom, Habtom W.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5303311/
https://www.ncbi.nlm.nih.gov/pubmed/28187708
http://dx.doi.org/10.1186/s12859-017-1515-1
_version_ 1782506688203456512
author Zuo, Yiming
Cui, Yi
Yu, Guoqiang
Li, Ruijiang
Ressom, Habtom W.
author_facet Zuo, Yiming
Cui, Yi
Yu, Guoqiang
Li, Ruijiang
Ressom, Habtom W.
author_sort Zuo, Yiming
collection PubMed
description BACKGROUND: Conventional differential gene expression analysis by methods such as student’s t-test, SAM, and Empirical Bayes often searches for statistically significant genes without considering the interactions among them. Network-based approaches provide a natural way to study these interactions and to investigate the rewiring interactions in disease versus control groups. In this paper, we apply weighted graphical LASSO (wgLASSO) algorithm to integrate a data-driven network model with prior biological knowledge (i.e., protein-protein interactions) for biological network inference. We propose a novel differentially weighted graphical LASSO (dwgLASSO) algorithm that builds group-specific networks and perform network-based differential gene expression analysis to select biomarker candidates by considering their topological differences between the groups. RESULTS: Through simulation, we showed that wgLASSO can achieve better performance in building biologically relevant networks than purely data-driven models (e.g., neighbor selection, graphical LASSO), even when only a moderate level of information is available as prior biological knowledge. We evaluated the performance of dwgLASSO for survival time prediction using two microarray breast cancer datasets previously reported by Bild et al. and van de Vijver et al. Compared with the top 10 significant genes selected by conventional differential gene expression analysis method, the top 10 significant genes selected by dwgLASSO in the dataset from Bild et al. led to a significantly improved survival time prediction in the independent dataset from van de Vijver et al. Among the 10 genes selected by dwgLASSO, UBE2S, SALL2, XBP1 and KIAA0922 have been confirmed by literature survey to be highly relevant in breast cancer biomarker discovery study. Additionally, we tested dwgLASSO on TCGA RNA-seq data acquired from patients with hepatocellular carcinoma (HCC) on tumors samples and their corresponding non-tumorous liver tissues. Improved sensitivity, specificity and area under curve (AUC) were observed when comparing dwgLASSO with conventional differential gene expression analysis method. CONCLUSIONS: The proposed network-based differential gene expression analysis algorithm dwgLASSO can achieve better performance than conventional differential gene expression analysis methods by integrating information at both gene expression and network topology levels. The incorporation of prior biological knowledge can lead to the identification of biologically meaningful genes in cancer biomarker studies. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1515-1) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5303311
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-53033112017-02-15 Incorporating prior biological knowledge for network-based differential gene expression analysis using differentially weighted graphical LASSO Zuo, Yiming Cui, Yi Yu, Guoqiang Li, Ruijiang Ressom, Habtom W. BMC Bioinformatics Methodology Article BACKGROUND: Conventional differential gene expression analysis by methods such as student’s t-test, SAM, and Empirical Bayes often searches for statistically significant genes without considering the interactions among them. Network-based approaches provide a natural way to study these interactions and to investigate the rewiring interactions in disease versus control groups. In this paper, we apply weighted graphical LASSO (wgLASSO) algorithm to integrate a data-driven network model with prior biological knowledge (i.e., protein-protein interactions) for biological network inference. We propose a novel differentially weighted graphical LASSO (dwgLASSO) algorithm that builds group-specific networks and perform network-based differential gene expression analysis to select biomarker candidates by considering their topological differences between the groups. RESULTS: Through simulation, we showed that wgLASSO can achieve better performance in building biologically relevant networks than purely data-driven models (e.g., neighbor selection, graphical LASSO), even when only a moderate level of information is available as prior biological knowledge. We evaluated the performance of dwgLASSO for survival time prediction using two microarray breast cancer datasets previously reported by Bild et al. and van de Vijver et al. Compared with the top 10 significant genes selected by conventional differential gene expression analysis method, the top 10 significant genes selected by dwgLASSO in the dataset from Bild et al. led to a significantly improved survival time prediction in the independent dataset from van de Vijver et al. Among the 10 genes selected by dwgLASSO, UBE2S, SALL2, XBP1 and KIAA0922 have been confirmed by literature survey to be highly relevant in breast cancer biomarker discovery study. Additionally, we tested dwgLASSO on TCGA RNA-seq data acquired from patients with hepatocellular carcinoma (HCC) on tumors samples and their corresponding non-tumorous liver tissues. Improved sensitivity, specificity and area under curve (AUC) were observed when comparing dwgLASSO with conventional differential gene expression analysis method. CONCLUSIONS: The proposed network-based differential gene expression analysis algorithm dwgLASSO can achieve better performance than conventional differential gene expression analysis methods by integrating information at both gene expression and network topology levels. The incorporation of prior biological knowledge can lead to the identification of biologically meaningful genes in cancer biomarker studies. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1515-1) contains supplementary material, which is available to authorized users. BioMed Central 2017-02-10 /pmc/articles/PMC5303311/ /pubmed/28187708 http://dx.doi.org/10.1186/s12859-017-1515-1 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Zuo, Yiming
Cui, Yi
Yu, Guoqiang
Li, Ruijiang
Ressom, Habtom W.
Incorporating prior biological knowledge for network-based differential gene expression analysis using differentially weighted graphical LASSO
title Incorporating prior biological knowledge for network-based differential gene expression analysis using differentially weighted graphical LASSO
title_full Incorporating prior biological knowledge for network-based differential gene expression analysis using differentially weighted graphical LASSO
title_fullStr Incorporating prior biological knowledge for network-based differential gene expression analysis using differentially weighted graphical LASSO
title_full_unstemmed Incorporating prior biological knowledge for network-based differential gene expression analysis using differentially weighted graphical LASSO
title_short Incorporating prior biological knowledge for network-based differential gene expression analysis using differentially weighted graphical LASSO
title_sort incorporating prior biological knowledge for network-based differential gene expression analysis using differentially weighted graphical lasso
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5303311/
https://www.ncbi.nlm.nih.gov/pubmed/28187708
http://dx.doi.org/10.1186/s12859-017-1515-1
work_keys_str_mv AT zuoyiming incorporatingpriorbiologicalknowledgefornetworkbaseddifferentialgeneexpressionanalysisusingdifferentiallyweightedgraphicallasso
AT cuiyi incorporatingpriorbiologicalknowledgefornetworkbaseddifferentialgeneexpressionanalysisusingdifferentiallyweightedgraphicallasso
AT yuguoqiang incorporatingpriorbiologicalknowledgefornetworkbaseddifferentialgeneexpressionanalysisusingdifferentiallyweightedgraphicallasso
AT liruijiang incorporatingpriorbiologicalknowledgefornetworkbaseddifferentialgeneexpressionanalysisusingdifferentiallyweightedgraphicallasso
AT ressomhabtomw incorporatingpriorbiologicalknowledgefornetworkbaseddifferentialgeneexpressionanalysisusingdifferentiallyweightedgraphicallasso