Cargando…

Explainable protein function annotation using local structure embeddings

The rapid expansion of protein sequence and structure databases has resulted in a significant number of proteins with ambiguous or unknown function. While advances in machine learning techniques hold great potential to fill this annotation gap, current methods for function prediction are unable to a...

Descripción completa

Detalles Bibliográficos
Autores principales: Derry, Alexander, Altman, Russ B.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10614799/
https://www.ncbi.nlm.nih.gov/pubmed/37905033
http://dx.doi.org/10.1101/2023.10.13.562298
_version_ 1785129102942928896
author Derry, Alexander
Altman, Russ B.
author_facet Derry, Alexander
Altman, Russ B.
author_sort Derry, Alexander
collection PubMed
description The rapid expansion of protein sequence and structure databases has resulted in a significant number of proteins with ambiguous or unknown function. While advances in machine learning techniques hold great potential to fill this annotation gap, current methods for function prediction are unable to associate global function reliably to the specific residues responsible for that function. We address this issue by introducing PARSE (Protein Annotation by Residue-Specific Enrichment), a knowledge-based method which combines pre-trained embeddings of local structural environments with traditional statistical techniques to identify enriched functions with residue-level explainability. For the task of predicting the catalytic function of enzymes, PARSE achieves comparable or superior global performance to state-of-the-art machine learning methods (F1 score > 85%) while simultaneously annotating the specific residues involved in each function with much greater precision. Since it does not require supervised training, our method can make one-shot predictions for very rare functions and is not limited to a particular type of functional label (e.g. Enzyme Commission numbers or Gene Ontology codes). Finally, we leverage the AlphaFold Structure Database to perform functional annotation at a proteome scale. By applying PARSE to the dark proteome—predicted structures which cannot be classified into known structural families—we predict several novel bacterial metalloproteases. Each of these proteins shares a strongly conserved catalytic site despite highly divergent sequences and global folds, illustrating the value of local structure representations for new function discovery.
format Online
Article
Text
id pubmed-10614799
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-106147992023-10-31 Explainable protein function annotation using local structure embeddings Derry, Alexander Altman, Russ B. bioRxiv Article The rapid expansion of protein sequence and structure databases has resulted in a significant number of proteins with ambiguous or unknown function. While advances in machine learning techniques hold great potential to fill this annotation gap, current methods for function prediction are unable to associate global function reliably to the specific residues responsible for that function. We address this issue by introducing PARSE (Protein Annotation by Residue-Specific Enrichment), a knowledge-based method which combines pre-trained embeddings of local structural environments with traditional statistical techniques to identify enriched functions with residue-level explainability. For the task of predicting the catalytic function of enzymes, PARSE achieves comparable or superior global performance to state-of-the-art machine learning methods (F1 score > 85%) while simultaneously annotating the specific residues involved in each function with much greater precision. Since it does not require supervised training, our method can make one-shot predictions for very rare functions and is not limited to a particular type of functional label (e.g. Enzyme Commission numbers or Gene Ontology codes). Finally, we leverage the AlphaFold Structure Database to perform functional annotation at a proteome scale. By applying PARSE to the dark proteome—predicted structures which cannot be classified into known structural families—we predict several novel bacterial metalloproteases. Each of these proteins shares a strongly conserved catalytic site despite highly divergent sequences and global folds, illustrating the value of local structure representations for new function discovery. Cold Spring Harbor Laboratory 2023-10-16 /pmc/articles/PMC10614799/ /pubmed/37905033 http://dx.doi.org/10.1101/2023.10.13.562298 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.
spellingShingle Article
Derry, Alexander
Altman, Russ B.
Explainable protein function annotation using local structure embeddings
title Explainable protein function annotation using local structure embeddings
title_full Explainable protein function annotation using local structure embeddings
title_fullStr Explainable protein function annotation using local structure embeddings
title_full_unstemmed Explainable protein function annotation using local structure embeddings
title_short Explainable protein function annotation using local structure embeddings
title_sort explainable protein function annotation using local structure embeddings
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10614799/
https://www.ncbi.nlm.nih.gov/pubmed/37905033
http://dx.doi.org/10.1101/2023.10.13.562298
work_keys_str_mv AT derryalexander explainableproteinfunctionannotationusinglocalstructureembeddings
AT altmanrussb explainableproteinfunctionannotationusinglocalstructureembeddings