Cargando…

PCLassoLog: A protein complex-based, group Lasso-logistic model for cancer classification and risk protein complex discovery

Risk gene identification has attracted much attention in the past two decades. Since most genes need to be translated into proteins and cooperate with other proteins to form protein complexes to carry out cellular functions, which significantly extends the functional diversity of individual proteins...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Wei, Yuan, Haiyan, Han, Junwei, Liu, Wei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Research Network of Computational and Structural Biotechnology 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9791601/
https://www.ncbi.nlm.nih.gov/pubmed/36582441
http://dx.doi.org/10.1016/j.csbj.2022.12.005
_version_ 1784859443735822336
author Wang, Wei
Yuan, Haiyan
Han, Junwei
Liu, Wei
author_facet Wang, Wei
Yuan, Haiyan
Han, Junwei
Liu, Wei
author_sort Wang, Wei
collection PubMed
description Risk gene identification has attracted much attention in the past two decades. Since most genes need to be translated into proteins and cooperate with other proteins to form protein complexes to carry out cellular functions, which significantly extends the functional diversity of individual proteins, revealing the molecular mechanism of cancer from a comprehensive perspective needs to shift from identifying individual risk genes toward identifying risk protein complexes. Here, we embed protein complexes into the regularized learning framework and propose a protein complex-based, group Lasso-logistic model (PCLassoLog) to discover risk protein complexes. Experiments on deep proteomic data of two cancer types show that PCLassoLog yields superior predictive performance on independent datasets. More importantly, PCLassoLog identifies risk protein complexes that not only contain individual risk proteins but also incorporate close partners that synergize with them. Furthermore, selection probabilities are calculated and two other protein complex-based models are proposed to complement PCLassoLog in identifying reliable risk protein complexes. Based on PCLassoLog, a pan-cancer analysis is performed to identify risk protein complexes in 12 cancer types. Finally, PCLassoLog is used to discover risk protein complexes associated with gene mutation. We implement all protein complex-based models as an R package PCLassoReg, which may serve as an effective tool to discover risk protein complexes in various contexts.
format Online
Article
Text
id pubmed-9791601
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Research Network of Computational and Structural Biotechnology
record_format MEDLINE/PubMed
spelling pubmed-97916012022-12-28 PCLassoLog: A protein complex-based, group Lasso-logistic model for cancer classification and risk protein complex discovery Wang, Wei Yuan, Haiyan Han, Junwei Liu, Wei Comput Struct Biotechnol J Research Article Risk gene identification has attracted much attention in the past two decades. Since most genes need to be translated into proteins and cooperate with other proteins to form protein complexes to carry out cellular functions, which significantly extends the functional diversity of individual proteins, revealing the molecular mechanism of cancer from a comprehensive perspective needs to shift from identifying individual risk genes toward identifying risk protein complexes. Here, we embed protein complexes into the regularized learning framework and propose a protein complex-based, group Lasso-logistic model (PCLassoLog) to discover risk protein complexes. Experiments on deep proteomic data of two cancer types show that PCLassoLog yields superior predictive performance on independent datasets. More importantly, PCLassoLog identifies risk protein complexes that not only contain individual risk proteins but also incorporate close partners that synergize with them. Furthermore, selection probabilities are calculated and two other protein complex-based models are proposed to complement PCLassoLog in identifying reliable risk protein complexes. Based on PCLassoLog, a pan-cancer analysis is performed to identify risk protein complexes in 12 cancer types. Finally, PCLassoLog is used to discover risk protein complexes associated with gene mutation. We implement all protein complex-based models as an R package PCLassoReg, which may serve as an effective tool to discover risk protein complexes in various contexts. Research Network of Computational and Structural Biotechnology 2022-12-06 /pmc/articles/PMC9791601/ /pubmed/36582441 http://dx.doi.org/10.1016/j.csbj.2022.12.005 Text en © 2022 The Author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Research Article
Wang, Wei
Yuan, Haiyan
Han, Junwei
Liu, Wei
PCLassoLog: A protein complex-based, group Lasso-logistic model for cancer classification and risk protein complex discovery
title PCLassoLog: A protein complex-based, group Lasso-logistic model for cancer classification and risk protein complex discovery
title_full PCLassoLog: A protein complex-based, group Lasso-logistic model for cancer classification and risk protein complex discovery
title_fullStr PCLassoLog: A protein complex-based, group Lasso-logistic model for cancer classification and risk protein complex discovery
title_full_unstemmed PCLassoLog: A protein complex-based, group Lasso-logistic model for cancer classification and risk protein complex discovery
title_short PCLassoLog: A protein complex-based, group Lasso-logistic model for cancer classification and risk protein complex discovery
title_sort pclassolog: a protein complex-based, group lasso-logistic model for cancer classification and risk protein complex discovery
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9791601/
https://www.ncbi.nlm.nih.gov/pubmed/36582441
http://dx.doi.org/10.1016/j.csbj.2022.12.005
work_keys_str_mv AT wangwei pclassologaproteincomplexbasedgrouplassologisticmodelforcancerclassificationandriskproteincomplexdiscovery
AT yuanhaiyan pclassologaproteincomplexbasedgrouplassologisticmodelforcancerclassificationandriskproteincomplexdiscovery
AT hanjunwei pclassologaproteincomplexbasedgrouplassologisticmodelforcancerclassificationandriskproteincomplexdiscovery
AT liuwei pclassologaproteincomplexbasedgrouplassologisticmodelforcancerclassificationandriskproteincomplexdiscovery