Cargando…

TT3D: Leveraging precomputed protein 3D sequence models to predict protein–protein interactions

MOTIVATION: High-quality computational structural models are now precomputed and available for nearly every protein in UniProt. However, the best way to leverage these models to predict which pairs of proteins interact in a high-throughput manner is not immediately clear. The recent Foldseek method...

Descripción completa

Detalles Bibliográficos
Autores principales: Sledzieski, Samuel, Devkota, Kapil, Singh, Rohit, Cowen, Lenore, Berger, Bonnie
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10640393/
https://www.ncbi.nlm.nih.gov/pubmed/37897686
http://dx.doi.org/10.1093/bioinformatics/btad663
_version_ 1785133745762729984
author Sledzieski, Samuel
Devkota, Kapil
Singh, Rohit
Cowen, Lenore
Berger, Bonnie
author_facet Sledzieski, Samuel
Devkota, Kapil
Singh, Rohit
Cowen, Lenore
Berger, Bonnie
author_sort Sledzieski, Samuel
collection PubMed
description MOTIVATION: High-quality computational structural models are now precomputed and available for nearly every protein in UniProt. However, the best way to leverage these models to predict which pairs of proteins interact in a high-throughput manner is not immediately clear. The recent Foldseek method of van Kempen et al. encodes the structural information of distances and angles along the protein backbone into a linear string of the same length as the protein string, using tokens from a 21-letter discretized structural alphabet (3Di). RESULTS: We show that using both the amino acid sequence and the 3Di sequence generated by Foldseek as inputs to our recent deep-learning method, Topsy-Turvy, substantially improves the performance of predicting protein–protein interactions cross-species. Thus TT3D (Topsy-Turvy 3D) presents a way to reuse all the computational effort going into producing high-quality structural models from sequence, while being sufficiently lightweight so that high-quality binary protein–protein interaction predictions across all protein pairs can be made genome-wide. AVAILABILITY AND IMPLEMENTATION: TT3D is available at https://github.com/samsledje/D-SCRIPT. An archived version of the code at time of submission can be found at https://zenodo.org/records/10037674.
format Online
Article
Text
id pubmed-10640393
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-106403932023-10-28 TT3D: Leveraging precomputed protein 3D sequence models to predict protein–protein interactions Sledzieski, Samuel Devkota, Kapil Singh, Rohit Cowen, Lenore Berger, Bonnie Bioinformatics Applications Note MOTIVATION: High-quality computational structural models are now precomputed and available for nearly every protein in UniProt. However, the best way to leverage these models to predict which pairs of proteins interact in a high-throughput manner is not immediately clear. The recent Foldseek method of van Kempen et al. encodes the structural information of distances and angles along the protein backbone into a linear string of the same length as the protein string, using tokens from a 21-letter discretized structural alphabet (3Di). RESULTS: We show that using both the amino acid sequence and the 3Di sequence generated by Foldseek as inputs to our recent deep-learning method, Topsy-Turvy, substantially improves the performance of predicting protein–protein interactions cross-species. Thus TT3D (Topsy-Turvy 3D) presents a way to reuse all the computational effort going into producing high-quality structural models from sequence, while being sufficiently lightweight so that high-quality binary protein–protein interaction predictions across all protein pairs can be made genome-wide. AVAILABILITY AND IMPLEMENTATION: TT3D is available at https://github.com/samsledje/D-SCRIPT. An archived version of the code at time of submission can be found at https://zenodo.org/records/10037674. Oxford University Press 2023-10-28 /pmc/articles/PMC10640393/ /pubmed/37897686 http://dx.doi.org/10.1093/bioinformatics/btad663 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Applications Note
Sledzieski, Samuel
Devkota, Kapil
Singh, Rohit
Cowen, Lenore
Berger, Bonnie
TT3D: Leveraging precomputed protein 3D sequence models to predict protein–protein interactions
title TT3D: Leveraging precomputed protein 3D sequence models to predict protein–protein interactions
title_full TT3D: Leveraging precomputed protein 3D sequence models to predict protein–protein interactions
title_fullStr TT3D: Leveraging precomputed protein 3D sequence models to predict protein–protein interactions
title_full_unstemmed TT3D: Leveraging precomputed protein 3D sequence models to predict protein–protein interactions
title_short TT3D: Leveraging precomputed protein 3D sequence models to predict protein–protein interactions
title_sort tt3d: leveraging precomputed protein 3d sequence models to predict protein–protein interactions
topic Applications Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10640393/
https://www.ncbi.nlm.nih.gov/pubmed/37897686
http://dx.doi.org/10.1093/bioinformatics/btad663
work_keys_str_mv AT sledzieskisamuel tt3dleveragingprecomputedprotein3dsequencemodelstopredictproteinproteininteractions
AT devkotakapil tt3dleveragingprecomputedprotein3dsequencemodelstopredictproteinproteininteractions
AT singhrohit tt3dleveragingprecomputedprotein3dsequencemodelstopredictproteinproteininteractions
AT cowenlenore tt3dleveragingprecomputedprotein3dsequencemodelstopredictproteinproteininteractions
AT bergerbonnie tt3dleveragingprecomputedprotein3dsequencemodelstopredictproteinproteininteractions