Cargando…
PCfun: a hybrid computational framework for systematic characterization of protein complex function
In molecular biology, it is a general assumption that the ensemble of expressed molecules, their activities and interactions determine biological function, cellular states and phenotypes. Stable protein complexes—or macromolecular machines—are, in turn, the key functional entities mediating and modu...
Autores principales: | , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9310514/ https://www.ncbi.nlm.nih.gov/pubmed/35724564 http://dx.doi.org/10.1093/bib/bbac239 |
_version_ | 1784753400333729792 |
---|---|
author | Sharma, Varun S Fossati, Andrea Ciuffa, Rodolfo Buljan, Marija Williams, Evan G Chen, Zhen Shao, Wenguang Pedrioli, Patrick G A Purcell, Anthony W Martínez, María Rodríguez Song, Jiangning Manica, Matteo Aebersold, Ruedi Li, Chen |
author_facet | Sharma, Varun S Fossati, Andrea Ciuffa, Rodolfo Buljan, Marija Williams, Evan G Chen, Zhen Shao, Wenguang Pedrioli, Patrick G A Purcell, Anthony W Martínez, María Rodríguez Song, Jiangning Manica, Matteo Aebersold, Ruedi Li, Chen |
author_sort | Sharma, Varun S |
collection | PubMed |
description | In molecular biology, it is a general assumption that the ensemble of expressed molecules, their activities and interactions determine biological function, cellular states and phenotypes. Stable protein complexes—or macromolecular machines—are, in turn, the key functional entities mediating and modulating most biological processes. Although identifying protein complexes and their subunit composition can now be done inexpensively and at scale, determining their function remains challenging and labor intensive. This study describes Protein Complex Function predictor (PCfun), the first computational framework for the systematic annotation of protein complex functions using Gene Ontology (GO) terms. PCfun is built upon a word embedding using natural language processing techniques based on 1 million open access PubMed Central articles. Specifically, PCfun leverages two approaches for accurately identifying protein complex function, including: (i) an unsupervised approach that obtains the nearest neighbor (NN) GO term word vectors for a protein complex query vector and (ii) a supervised approach using Random Forest (RF) models trained specifically for recovering the GO terms of protein complex queries described in the CORUM protein complex database. PCfun consolidates both approaches by performing a hypergeometric statistical test to enrich the top NN GO terms within the child terms of the GO terms predicted by the RF models. The documentation and implementation of the PCfun package are available at https://github.com/sharmavaruns/PCfun. We anticipate that PCfun will serve as a useful tool and novel paradigm for the large-scale characterization of protein complex function. |
format | Online Article Text |
id | pubmed-9310514 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-93105142022-07-26 PCfun: a hybrid computational framework for systematic characterization of protein complex function Sharma, Varun S Fossati, Andrea Ciuffa, Rodolfo Buljan, Marija Williams, Evan G Chen, Zhen Shao, Wenguang Pedrioli, Patrick G A Purcell, Anthony W Martínez, María Rodríguez Song, Jiangning Manica, Matteo Aebersold, Ruedi Li, Chen Brief Bioinform Problem Solving Protocol In molecular biology, it is a general assumption that the ensemble of expressed molecules, their activities and interactions determine biological function, cellular states and phenotypes. Stable protein complexes—or macromolecular machines—are, in turn, the key functional entities mediating and modulating most biological processes. Although identifying protein complexes and their subunit composition can now be done inexpensively and at scale, determining their function remains challenging and labor intensive. This study describes Protein Complex Function predictor (PCfun), the first computational framework for the systematic annotation of protein complex functions using Gene Ontology (GO) terms. PCfun is built upon a word embedding using natural language processing techniques based on 1 million open access PubMed Central articles. Specifically, PCfun leverages two approaches for accurately identifying protein complex function, including: (i) an unsupervised approach that obtains the nearest neighbor (NN) GO term word vectors for a protein complex query vector and (ii) a supervised approach using Random Forest (RF) models trained specifically for recovering the GO terms of protein complex queries described in the CORUM protein complex database. PCfun consolidates both approaches by performing a hypergeometric statistical test to enrich the top NN GO terms within the child terms of the GO terms predicted by the RF models. The documentation and implementation of the PCfun package are available at https://github.com/sharmavaruns/PCfun. We anticipate that PCfun will serve as a useful tool and novel paradigm for the large-scale characterization of protein complex function. Oxford University Press 2022-06-21 /pmc/articles/PMC9310514/ /pubmed/35724564 http://dx.doi.org/10.1093/bib/bbac239 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Problem Solving Protocol Sharma, Varun S Fossati, Andrea Ciuffa, Rodolfo Buljan, Marija Williams, Evan G Chen, Zhen Shao, Wenguang Pedrioli, Patrick G A Purcell, Anthony W Martínez, María Rodríguez Song, Jiangning Manica, Matteo Aebersold, Ruedi Li, Chen PCfun: a hybrid computational framework for systematic characterization of protein complex function |
title | PCfun: a hybrid computational framework for systematic characterization of protein complex function |
title_full | PCfun: a hybrid computational framework for systematic characterization of protein complex function |
title_fullStr | PCfun: a hybrid computational framework for systematic characterization of protein complex function |
title_full_unstemmed | PCfun: a hybrid computational framework for systematic characterization of protein complex function |
title_short | PCfun: a hybrid computational framework for systematic characterization of protein complex function |
title_sort | pcfun: a hybrid computational framework for systematic characterization of protein complex function |
topic | Problem Solving Protocol |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9310514/ https://www.ncbi.nlm.nih.gov/pubmed/35724564 http://dx.doi.org/10.1093/bib/bbac239 |
work_keys_str_mv | AT sharmavaruns pcfunahybridcomputationalframeworkforsystematiccharacterizationofproteincomplexfunction AT fossatiandrea pcfunahybridcomputationalframeworkforsystematiccharacterizationofproteincomplexfunction AT ciuffarodolfo pcfunahybridcomputationalframeworkforsystematiccharacterizationofproteincomplexfunction AT buljanmarija pcfunahybridcomputationalframeworkforsystematiccharacterizationofproteincomplexfunction AT williamsevang pcfunahybridcomputationalframeworkforsystematiccharacterizationofproteincomplexfunction AT chenzhen pcfunahybridcomputationalframeworkforsystematiccharacterizationofproteincomplexfunction AT shaowenguang pcfunahybridcomputationalframeworkforsystematiccharacterizationofproteincomplexfunction AT pedriolipatrickga pcfunahybridcomputationalframeworkforsystematiccharacterizationofproteincomplexfunction AT purcellanthonyw pcfunahybridcomputationalframeworkforsystematiccharacterizationofproteincomplexfunction AT martinezmariarodriguez pcfunahybridcomputationalframeworkforsystematiccharacterizationofproteincomplexfunction AT songjiangning pcfunahybridcomputationalframeworkforsystematiccharacterizationofproteincomplexfunction AT manicamatteo pcfunahybridcomputationalframeworkforsystematiccharacterizationofproteincomplexfunction AT aebersoldruedi pcfunahybridcomputationalframeworkforsystematiccharacterizationofproteincomplexfunction AT lichen pcfunahybridcomputationalframeworkforsystematiccharacterizationofproteincomplexfunction |