Cargando…
COLLAPSE: A representation learning framework for identification and characterization of protein structural sites
The identification and characterization of the structural sites which contribute to protein function are crucial for understanding biological mechanisms, evaluating disease risk, and developing targeted therapies. However, the quantity of known protein structures is rapidly outpacing our ability to...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
John Wiley & Sons, Inc.
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9847082/ https://www.ncbi.nlm.nih.gov/pubmed/36519247 http://dx.doi.org/10.1002/pro.4541 |
_version_ | 1784871358516166656 |
---|---|
author | Derry, Alexander Altman, Russ B. |
author_facet | Derry, Alexander Altman, Russ B. |
author_sort | Derry, Alexander |
collection | PubMed |
description | The identification and characterization of the structural sites which contribute to protein function are crucial for understanding biological mechanisms, evaluating disease risk, and developing targeted therapies. However, the quantity of known protein structures is rapidly outpacing our ability to functionally annotate them. Existing methods for function prediction either do not operate on local sites, suffer from high false positive or false negative rates, or require large site‐specific training datasets, necessitating the development of new computational methods for annotating functional sites at scale. We present COLLAPSE (Compressed Latents Learned from Aligned Protein Structural Environments), a framework for learning deep representations of protein sites. COLLAPSE operates directly on the 3D positions of atoms surrounding a site and uses evolutionary relationships between homologous proteins as a self‐supervision signal, enabling learned embeddings to implicitly capture structure–function relationships within each site. Our representations generalize across disparate tasks in a transfer learning context, achieving state‐of‐the‐art performance on standardized benchmarks (protein–protein interactions and mutation stability) and on the prediction of functional sites from the prosite database. We use COLLAPSE to search for similar sites across large protein datasets and to annotate proteins based on a database of known functional sites. These methods demonstrate that COLLAPSE is computationally efficient, tunable, and interpretable, providing a general‐purpose platform for computational protein analysis. |
format | Online Article Text |
id | pubmed-9847082 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | John Wiley & Sons, Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-98470822023-02-01 COLLAPSE: A representation learning framework for identification and characterization of protein structural sites Derry, Alexander Altman, Russ B. Protein Sci Tools for Protein Science The identification and characterization of the structural sites which contribute to protein function are crucial for understanding biological mechanisms, evaluating disease risk, and developing targeted therapies. However, the quantity of known protein structures is rapidly outpacing our ability to functionally annotate them. Existing methods for function prediction either do not operate on local sites, suffer from high false positive or false negative rates, or require large site‐specific training datasets, necessitating the development of new computational methods for annotating functional sites at scale. We present COLLAPSE (Compressed Latents Learned from Aligned Protein Structural Environments), a framework for learning deep representations of protein sites. COLLAPSE operates directly on the 3D positions of atoms surrounding a site and uses evolutionary relationships between homologous proteins as a self‐supervision signal, enabling learned embeddings to implicitly capture structure–function relationships within each site. Our representations generalize across disparate tasks in a transfer learning context, achieving state‐of‐the‐art performance on standardized benchmarks (protein–protein interactions and mutation stability) and on the prediction of functional sites from the prosite database. We use COLLAPSE to search for similar sites across large protein datasets and to annotate proteins based on a database of known functional sites. These methods demonstrate that COLLAPSE is computationally efficient, tunable, and interpretable, providing a general‐purpose platform for computational protein analysis. John Wiley & Sons, Inc. 2023-02-01 /pmc/articles/PMC9847082/ /pubmed/36519247 http://dx.doi.org/10.1002/pro.4541 Text en © 2022 The Authors. Protein Science published by Wiley Periodicals LLC on behalf of The Protein Society. https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the terms of the http://creativecommons.org/licenses/by-nc-nd/4.0/ (https://creativecommons.org/licenses/by-nc-nd/4.0/) License, which permits use and distribution in any medium, provided the original work is properly cited, the use is non‐commercial and no modifications or adaptations are made. |
spellingShingle | Tools for Protein Science Derry, Alexander Altman, Russ B. COLLAPSE: A representation learning framework for identification and characterization of protein structural sites |
title |
COLLAPSE: A representation learning framework for identification and characterization of protein structural sites |
title_full |
COLLAPSE: A representation learning framework for identification and characterization of protein structural sites |
title_fullStr |
COLLAPSE: A representation learning framework for identification and characterization of protein structural sites |
title_full_unstemmed |
COLLAPSE: A representation learning framework for identification and characterization of protein structural sites |
title_short |
COLLAPSE: A representation learning framework for identification and characterization of protein structural sites |
title_sort | collapse: a representation learning framework for identification and characterization of protein structural sites |
topic | Tools for Protein Science |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9847082/ https://www.ncbi.nlm.nih.gov/pubmed/36519247 http://dx.doi.org/10.1002/pro.4541 |
work_keys_str_mv | AT derryalexander collapsearepresentationlearningframeworkforidentificationandcharacterizationofproteinstructuralsites AT altmanrussb collapsearepresentationlearningframeworkforidentificationandcharacterizationofproteinstructuralsites |