Cargando…

Molecular pathway identification using biological network-regularized logistic models

BACKGROUND: Selecting genes and pathways indicative of disease is a central problem in computational biology. This problem is especially challenging when parsing multi-dimensional genomic data. A number of tools, such as L(1)-norm based regularization and its extensions elastic net and fused lasso,...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Wen, Wan, Ying-wooi, Allen, Genevera I, Pang, Kaifang, Anderson, Matthew L, Liu, Zhandong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4046566/
https://www.ncbi.nlm.nih.gov/pubmed/24564637
http://dx.doi.org/10.1186/1471-2164-14-S8-S7
_version_ 1782480277171339264
author Zhang, Wen
Wan, Ying-wooi
Allen, Genevera I
Pang, Kaifang
Anderson, Matthew L
Liu, Zhandong
author_facet Zhang, Wen
Wan, Ying-wooi
Allen, Genevera I
Pang, Kaifang
Anderson, Matthew L
Liu, Zhandong
author_sort Zhang, Wen
collection PubMed
description BACKGROUND: Selecting genes and pathways indicative of disease is a central problem in computational biology. This problem is especially challenging when parsing multi-dimensional genomic data. A number of tools, such as L(1)-norm based regularization and its extensions elastic net and fused lasso, have been introduced to deal with this challenge. However, these approaches tend to ignore the vast amount of a priori biological network information curated in the literature. RESULTS: We propose the use of graph Laplacian regularized logistic regression to integrate biological networks into disease classification and pathway association problems. Simulation studies demonstrate that the performance of the proposed algorithm is superior to elastic net and lasso analyses. Utility of this algorithm is also validated by its ability to reliably differentiate breast cancer subtypes using a large breast cancer dataset recently generated by the Cancer Genome Atlas (TCGA) consortium. Many of the protein-protein interaction modules identified by our approach are further supported by evidence published in the literature. Source code of the proposed algorithm is freely available at http://www.github.com/zhandong/Logit-Lapnet. CONCLUSION: Logistic regression with graph Laplacian regularization is an effective algorithm for identifying key pathways and modules associated with disease subtypes. With the rapid expansion of our knowledge of biological regulatory networks, this approach will become more accurate and increasingly useful for mining transcriptomic, epi-genomic, and other types of genome wide association studies.
format Online
Article
Text
id pubmed-4046566
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-40465662014-06-05 Molecular pathway identification using biological network-regularized logistic models Zhang, Wen Wan, Ying-wooi Allen, Genevera I Pang, Kaifang Anderson, Matthew L Liu, Zhandong BMC Genomics Research BACKGROUND: Selecting genes and pathways indicative of disease is a central problem in computational biology. This problem is especially challenging when parsing multi-dimensional genomic data. A number of tools, such as L(1)-norm based regularization and its extensions elastic net and fused lasso, have been introduced to deal with this challenge. However, these approaches tend to ignore the vast amount of a priori biological network information curated in the literature. RESULTS: We propose the use of graph Laplacian regularized logistic regression to integrate biological networks into disease classification and pathway association problems. Simulation studies demonstrate that the performance of the proposed algorithm is superior to elastic net and lasso analyses. Utility of this algorithm is also validated by its ability to reliably differentiate breast cancer subtypes using a large breast cancer dataset recently generated by the Cancer Genome Atlas (TCGA) consortium. Many of the protein-protein interaction modules identified by our approach are further supported by evidence published in the literature. Source code of the proposed algorithm is freely available at http://www.github.com/zhandong/Logit-Lapnet. CONCLUSION: Logistic regression with graph Laplacian regularization is an effective algorithm for identifying key pathways and modules associated with disease subtypes. With the rapid expansion of our knowledge of biological regulatory networks, this approach will become more accurate and increasingly useful for mining transcriptomic, epi-genomic, and other types of genome wide association studies. BioMed Central 2013-12-09 /pmc/articles/PMC4046566/ /pubmed/24564637 http://dx.doi.org/10.1186/1471-2164-14-S8-S7 Text en Copyright © 2013 Zhang et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Zhang, Wen
Wan, Ying-wooi
Allen, Genevera I
Pang, Kaifang
Anderson, Matthew L
Liu, Zhandong
Molecular pathway identification using biological network-regularized logistic models
title Molecular pathway identification using biological network-regularized logistic models
title_full Molecular pathway identification using biological network-regularized logistic models
title_fullStr Molecular pathway identification using biological network-regularized logistic models
title_full_unstemmed Molecular pathway identification using biological network-regularized logistic models
title_short Molecular pathway identification using biological network-regularized logistic models
title_sort molecular pathway identification using biological network-regularized logistic models
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4046566/
https://www.ncbi.nlm.nih.gov/pubmed/24564637
http://dx.doi.org/10.1186/1471-2164-14-S8-S7
work_keys_str_mv AT zhangwen molecularpathwayidentificationusingbiologicalnetworkregularizedlogisticmodels
AT wanyingwooi molecularpathwayidentificationusingbiologicalnetworkregularizedlogisticmodels
AT allengeneverai molecularpathwayidentificationusingbiologicalnetworkregularizedlogisticmodels
AT pangkaifang molecularpathwayidentificationusingbiologicalnetworkregularizedlogisticmodels
AT andersonmatthewl molecularpathwayidentificationusingbiologicalnetworkregularizedlogisticmodels
AT liuzhandong molecularpathwayidentificationusingbiologicalnetworkregularizedlogisticmodels