Cargando…

COLLAPSE: A representation learning framework for identification and characterization of protein structural sites

The identification and characterization of the structural sites which contribute to protein function are crucial for understanding biological mechanisms, evaluating disease risk, and developing targeted therapies. However, the quantity of known protein structures is rapidly outpacing our ability to...

Descripción completa

Detalles Bibliográficos
Autores principales: Derry, Alexander, Altman, Russ B.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley & Sons, Inc. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9847082/
https://www.ncbi.nlm.nih.gov/pubmed/36519247
http://dx.doi.org/10.1002/pro.4541
_version_ 1784871358516166656
author Derry, Alexander
Altman, Russ B.
author_facet Derry, Alexander
Altman, Russ B.
author_sort Derry, Alexander
collection PubMed
description The identification and characterization of the structural sites which contribute to protein function are crucial for understanding biological mechanisms, evaluating disease risk, and developing targeted therapies. However, the quantity of known protein structures is rapidly outpacing our ability to functionally annotate them. Existing methods for function prediction either do not operate on local sites, suffer from high false positive or false negative rates, or require large site‐specific training datasets, necessitating the development of new computational methods for annotating functional sites at scale. We present COLLAPSE (Compressed Latents Learned from Aligned Protein Structural Environments), a framework for learning deep representations of protein sites. COLLAPSE operates directly on the 3D positions of atoms surrounding a site and uses evolutionary relationships between homologous proteins as a self‐supervision signal, enabling learned embeddings to implicitly capture structure–function relationships within each site. Our representations generalize across disparate tasks in a transfer learning context, achieving state‐of‐the‐art performance on standardized benchmarks (protein–protein interactions and mutation stability) and on the prediction of functional sites from the prosite database. We use COLLAPSE to search for similar sites across large protein datasets and to annotate proteins based on a database of known functional sites. These methods demonstrate that COLLAPSE is computationally efficient, tunable, and interpretable, providing a general‐purpose platform for computational protein analysis.
format Online
Article
Text
id pubmed-9847082
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher John Wiley & Sons, Inc.
record_format MEDLINE/PubMed
spelling pubmed-98470822023-02-01 COLLAPSE: A representation learning framework for identification and characterization of protein structural sites Derry, Alexander Altman, Russ B. Protein Sci Tools for Protein Science The identification and characterization of the structural sites which contribute to protein function are crucial for understanding biological mechanisms, evaluating disease risk, and developing targeted therapies. However, the quantity of known protein structures is rapidly outpacing our ability to functionally annotate them. Existing methods for function prediction either do not operate on local sites, suffer from high false positive or false negative rates, or require large site‐specific training datasets, necessitating the development of new computational methods for annotating functional sites at scale. We present COLLAPSE (Compressed Latents Learned from Aligned Protein Structural Environments), a framework for learning deep representations of protein sites. COLLAPSE operates directly on the 3D positions of atoms surrounding a site and uses evolutionary relationships between homologous proteins as a self‐supervision signal, enabling learned embeddings to implicitly capture structure–function relationships within each site. Our representations generalize across disparate tasks in a transfer learning context, achieving state‐of‐the‐art performance on standardized benchmarks (protein–protein interactions and mutation stability) and on the prediction of functional sites from the prosite database. We use COLLAPSE to search for similar sites across large protein datasets and to annotate proteins based on a database of known functional sites. These methods demonstrate that COLLAPSE is computationally efficient, tunable, and interpretable, providing a general‐purpose platform for computational protein analysis. John Wiley & Sons, Inc. 2023-02-01 /pmc/articles/PMC9847082/ /pubmed/36519247 http://dx.doi.org/10.1002/pro.4541 Text en © 2022 The Authors. Protein Science published by Wiley Periodicals LLC on behalf of The Protein Society. https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the terms of the http://creativecommons.org/licenses/by-nc-nd/4.0/ (https://creativecommons.org/licenses/by-nc-nd/4.0/) License, which permits use and distribution in any medium, provided the original work is properly cited, the use is non‐commercial and no modifications or adaptations are made.
spellingShingle Tools for Protein Science
Derry, Alexander
Altman, Russ B.
COLLAPSE: A representation learning framework for identification and characterization of protein structural sites
title COLLAPSE: A representation learning framework for identification and characterization of protein structural sites
title_full COLLAPSE: A representation learning framework for identification and characterization of protein structural sites
title_fullStr COLLAPSE: A representation learning framework for identification and characterization of protein structural sites
title_full_unstemmed COLLAPSE: A representation learning framework for identification and characterization of protein structural sites
title_short COLLAPSE: A representation learning framework for identification and characterization of protein structural sites
title_sort collapse: a representation learning framework for identification and characterization of protein structural sites
topic Tools for Protein Science
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9847082/
https://www.ncbi.nlm.nih.gov/pubmed/36519247
http://dx.doi.org/10.1002/pro.4541
work_keys_str_mv AT derryalexander collapsearepresentationlearningframeworkforidentificationandcharacterizationofproteinstructuralsites
AT altmanrussb collapsearepresentationlearningframeworkforidentificationandcharacterizationofproteinstructuralsites