Cargando…

EquiPNAS: improved protein-nucleic acid binding site prediction using protein-language-model-informed equivariant deep graph neural networks

Protein language models (pLMs) trained on a large corpus of protein sequences have shown unprecedented scalability and broad generalizability in a wide range of predictive modeling tasks, but their power has not yet been harnessed for predicting protein-nucleic acid binding sites, critical for chara...

Descripción completa

Detalles Bibliográficos
Autores principales: Roche, Rahmatullah, Moussad, Bernard, Shuvo, Md Hossain, Tarafder, Sumit, Bhattacharya, Debswapna
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10515942/
https://www.ncbi.nlm.nih.gov/pubmed/37745556
http://dx.doi.org/10.1101/2023.09.14.557719
_version_ 1785109047539662848
author Roche, Rahmatullah
Moussad, Bernard
Shuvo, Md Hossain
Tarafder, Sumit
Bhattacharya, Debswapna
author_facet Roche, Rahmatullah
Moussad, Bernard
Shuvo, Md Hossain
Tarafder, Sumit
Bhattacharya, Debswapna
author_sort Roche, Rahmatullah
collection PubMed
description Protein language models (pLMs) trained on a large corpus of protein sequences have shown unprecedented scalability and broad generalizability in a wide range of predictive modeling tasks, but their power has not yet been harnessed for predicting protein-nucleic acid binding sites, critical for characterizing the interactions between proteins and nucleic acids. Here we present EquiPNAS, a new pLM-informed E(3) equivariant deep graph neural network framework for improved protein-nucleic acid binding site prediction. By combining the strengths of pLM and symmetry-aware deep graph learning, EquiPNAS consistently outperforms the state-of-the-art methods for both protein-DNA and protein-RNA binding site prediction on multiple datasets across a diverse set of predictive modeling scenarios ranging from using experimental input to AlphaFold2 predictions. Our ablation study reveals that the pLM embeddings used in EquiPNAS are sufficiently powerful to dramatically reduce the dependence on the availability of evolutionary information without compromising on accuracy, and that the symmetry-aware nature of the E(3) equivariant graph-based neural architecture offers remarkable robustness and performance resilience. EquiPNAS is freely available at https://github.com/Bhattacharya-Lab/EquiPNAS.
format Online
Article
Text
id pubmed-10515942
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-105159422023-09-23 EquiPNAS: improved protein-nucleic acid binding site prediction using protein-language-model-informed equivariant deep graph neural networks Roche, Rahmatullah Moussad, Bernard Shuvo, Md Hossain Tarafder, Sumit Bhattacharya, Debswapna bioRxiv Article Protein language models (pLMs) trained on a large corpus of protein sequences have shown unprecedented scalability and broad generalizability in a wide range of predictive modeling tasks, but their power has not yet been harnessed for predicting protein-nucleic acid binding sites, critical for characterizing the interactions between proteins and nucleic acids. Here we present EquiPNAS, a new pLM-informed E(3) equivariant deep graph neural network framework for improved protein-nucleic acid binding site prediction. By combining the strengths of pLM and symmetry-aware deep graph learning, EquiPNAS consistently outperforms the state-of-the-art methods for both protein-DNA and protein-RNA binding site prediction on multiple datasets across a diverse set of predictive modeling scenarios ranging from using experimental input to AlphaFold2 predictions. Our ablation study reveals that the pLM embeddings used in EquiPNAS are sufficiently powerful to dramatically reduce the dependence on the availability of evolutionary information without compromising on accuracy, and that the symmetry-aware nature of the E(3) equivariant graph-based neural architecture offers remarkable robustness and performance resilience. EquiPNAS is freely available at https://github.com/Bhattacharya-Lab/EquiPNAS. Cold Spring Harbor Laboratory 2023-09-16 /pmc/articles/PMC10515942/ /pubmed/37745556 http://dx.doi.org/10.1101/2023.09.14.557719 Text en https://creativecommons.org/licenses/by-nd/4.0/This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nd/4.0/) , which allows reusers to copy and distribute the material in any medium or format in unadapted form only, and only so long as attribution is given to the creator. The license allows for commercial use.
spellingShingle Article
Roche, Rahmatullah
Moussad, Bernard
Shuvo, Md Hossain
Tarafder, Sumit
Bhattacharya, Debswapna
EquiPNAS: improved protein-nucleic acid binding site prediction using protein-language-model-informed equivariant deep graph neural networks
title EquiPNAS: improved protein-nucleic acid binding site prediction using protein-language-model-informed equivariant deep graph neural networks
title_full EquiPNAS: improved protein-nucleic acid binding site prediction using protein-language-model-informed equivariant deep graph neural networks
title_fullStr EquiPNAS: improved protein-nucleic acid binding site prediction using protein-language-model-informed equivariant deep graph neural networks
title_full_unstemmed EquiPNAS: improved protein-nucleic acid binding site prediction using protein-language-model-informed equivariant deep graph neural networks
title_short EquiPNAS: improved protein-nucleic acid binding site prediction using protein-language-model-informed equivariant deep graph neural networks
title_sort equipnas: improved protein-nucleic acid binding site prediction using protein-language-model-informed equivariant deep graph neural networks
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10515942/
https://www.ncbi.nlm.nih.gov/pubmed/37745556
http://dx.doi.org/10.1101/2023.09.14.557719
work_keys_str_mv AT rocherahmatullah equipnasimprovedproteinnucleicacidbindingsitepredictionusingproteinlanguagemodelinformedequivariantdeepgraphneuralnetworks
AT moussadbernard equipnasimprovedproteinnucleicacidbindingsitepredictionusingproteinlanguagemodelinformedequivariantdeepgraphneuralnetworks
AT shuvomdhossain equipnasimprovedproteinnucleicacidbindingsitepredictionusingproteinlanguagemodelinformedequivariantdeepgraphneuralnetworks
AT tarafdersumit equipnasimprovedproteinnucleicacidbindingsitepredictionusingproteinlanguagemodelinformedequivariantdeepgraphneuralnetworks
AT bhattacharyadebswapna equipnasimprovedproteinnucleicacidbindingsitepredictionusingproteinlanguagemodelinformedequivariantdeepgraphneuralnetworks