Cargando…

PCfun: a hybrid computational framework for systematic characterization of protein complex function

In molecular biology, it is a general assumption that the ensemble of expressed molecules, their activities and interactions determine biological function, cellular states and phenotypes. Stable protein complexes—or macromolecular machines—are, in turn, the key functional entities mediating and modu...

Descripción completa

Detalles Bibliográficos
Autores principales: Sharma, Varun S, Fossati, Andrea, Ciuffa, Rodolfo, Buljan, Marija, Williams, Evan G, Chen, Zhen, Shao, Wenguang, Pedrioli, Patrick G A, Purcell, Anthony W, Martínez, María Rodríguez, Song, Jiangning, Manica, Matteo, Aebersold, Ruedi, Li, Chen
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9310514/
https://www.ncbi.nlm.nih.gov/pubmed/35724564
http://dx.doi.org/10.1093/bib/bbac239
_version_ 1784753400333729792
author Sharma, Varun S
Fossati, Andrea
Ciuffa, Rodolfo
Buljan, Marija
Williams, Evan G
Chen, Zhen
Shao, Wenguang
Pedrioli, Patrick G A
Purcell, Anthony W
Martínez, María Rodríguez
Song, Jiangning
Manica, Matteo
Aebersold, Ruedi
Li, Chen
author_facet Sharma, Varun S
Fossati, Andrea
Ciuffa, Rodolfo
Buljan, Marija
Williams, Evan G
Chen, Zhen
Shao, Wenguang
Pedrioli, Patrick G A
Purcell, Anthony W
Martínez, María Rodríguez
Song, Jiangning
Manica, Matteo
Aebersold, Ruedi
Li, Chen
author_sort Sharma, Varun S
collection PubMed
description In molecular biology, it is a general assumption that the ensemble of expressed molecules, their activities and interactions determine biological function, cellular states and phenotypes. Stable protein complexes—or macromolecular machines—are, in turn, the key functional entities mediating and modulating most biological processes. Although identifying protein complexes and their subunit composition can now be done inexpensively and at scale, determining their function remains challenging and labor intensive. This study describes Protein Complex Function predictor (PCfun), the first computational framework for the systematic annotation of protein complex functions using Gene Ontology (GO) terms. PCfun is built upon a word embedding using natural language processing techniques based on 1 million open access PubMed Central articles. Specifically, PCfun leverages two approaches for accurately identifying protein complex function, including: (i) an unsupervised approach that obtains the nearest neighbor (NN) GO term word vectors for a protein complex query vector and (ii) a supervised approach using Random Forest (RF) models trained specifically for recovering the GO terms of protein complex queries described in the CORUM protein complex database. PCfun consolidates both approaches by performing a hypergeometric statistical test to enrich the top NN GO terms within the child terms of the GO terms predicted by the RF models. The documentation and implementation of the PCfun package are available at https://github.com/sharmavaruns/PCfun. We anticipate that PCfun will serve as a useful tool and novel paradigm for the large-scale characterization of protein complex function.
format Online
Article
Text
id pubmed-9310514
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-93105142022-07-26 PCfun: a hybrid computational framework for systematic characterization of protein complex function Sharma, Varun S Fossati, Andrea Ciuffa, Rodolfo Buljan, Marija Williams, Evan G Chen, Zhen Shao, Wenguang Pedrioli, Patrick G A Purcell, Anthony W Martínez, María Rodríguez Song, Jiangning Manica, Matteo Aebersold, Ruedi Li, Chen Brief Bioinform Problem Solving Protocol In molecular biology, it is a general assumption that the ensemble of expressed molecules, their activities and interactions determine biological function, cellular states and phenotypes. Stable protein complexes—or macromolecular machines—are, in turn, the key functional entities mediating and modulating most biological processes. Although identifying protein complexes and their subunit composition can now be done inexpensively and at scale, determining their function remains challenging and labor intensive. This study describes Protein Complex Function predictor (PCfun), the first computational framework for the systematic annotation of protein complex functions using Gene Ontology (GO) terms. PCfun is built upon a word embedding using natural language processing techniques based on 1 million open access PubMed Central articles. Specifically, PCfun leverages two approaches for accurately identifying protein complex function, including: (i) an unsupervised approach that obtains the nearest neighbor (NN) GO term word vectors for a protein complex query vector and (ii) a supervised approach using Random Forest (RF) models trained specifically for recovering the GO terms of protein complex queries described in the CORUM protein complex database. PCfun consolidates both approaches by performing a hypergeometric statistical test to enrich the top NN GO terms within the child terms of the GO terms predicted by the RF models. The documentation and implementation of the PCfun package are available at https://github.com/sharmavaruns/PCfun. We anticipate that PCfun will serve as a useful tool and novel paradigm for the large-scale characterization of protein complex function. Oxford University Press 2022-06-21 /pmc/articles/PMC9310514/ /pubmed/35724564 http://dx.doi.org/10.1093/bib/bbac239 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Problem Solving Protocol
Sharma, Varun S
Fossati, Andrea
Ciuffa, Rodolfo
Buljan, Marija
Williams, Evan G
Chen, Zhen
Shao, Wenguang
Pedrioli, Patrick G A
Purcell, Anthony W
Martínez, María Rodríguez
Song, Jiangning
Manica, Matteo
Aebersold, Ruedi
Li, Chen
PCfun: a hybrid computational framework for systematic characterization of protein complex function
title PCfun: a hybrid computational framework for systematic characterization of protein complex function
title_full PCfun: a hybrid computational framework for systematic characterization of protein complex function
title_fullStr PCfun: a hybrid computational framework for systematic characterization of protein complex function
title_full_unstemmed PCfun: a hybrid computational framework for systematic characterization of protein complex function
title_short PCfun: a hybrid computational framework for systematic characterization of protein complex function
title_sort pcfun: a hybrid computational framework for systematic characterization of protein complex function
topic Problem Solving Protocol
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9310514/
https://www.ncbi.nlm.nih.gov/pubmed/35724564
http://dx.doi.org/10.1093/bib/bbac239
work_keys_str_mv AT sharmavaruns pcfunahybridcomputationalframeworkforsystematiccharacterizationofproteincomplexfunction
AT fossatiandrea pcfunahybridcomputationalframeworkforsystematiccharacterizationofproteincomplexfunction
AT ciuffarodolfo pcfunahybridcomputationalframeworkforsystematiccharacterizationofproteincomplexfunction
AT buljanmarija pcfunahybridcomputationalframeworkforsystematiccharacterizationofproteincomplexfunction
AT williamsevang pcfunahybridcomputationalframeworkforsystematiccharacterizationofproteincomplexfunction
AT chenzhen pcfunahybridcomputationalframeworkforsystematiccharacterizationofproteincomplexfunction
AT shaowenguang pcfunahybridcomputationalframeworkforsystematiccharacterizationofproteincomplexfunction
AT pedriolipatrickga pcfunahybridcomputationalframeworkforsystematiccharacterizationofproteincomplexfunction
AT purcellanthonyw pcfunahybridcomputationalframeworkforsystematiccharacterizationofproteincomplexfunction
AT martinezmariarodriguez pcfunahybridcomputationalframeworkforsystematiccharacterizationofproteincomplexfunction
AT songjiangning pcfunahybridcomputationalframeworkforsystematiccharacterizationofproteincomplexfunction
AT manicamatteo pcfunahybridcomputationalframeworkforsystematiccharacterizationofproteincomplexfunction
AT aebersoldruedi pcfunahybridcomputationalframeworkforsystematiccharacterizationofproteincomplexfunction
AT lichen pcfunahybridcomputationalframeworkforsystematiccharacterizationofproteincomplexfunction