Cargando…

Protein embeddings and deep learning predict binding residues for various ligand classes

One important aspect of protein function is the binding of proteins to ligands, including small molecules, metal ions, and macromolecules such as DNA or RNA. Despite decades of experimental progress many binding sites remain obscure. Here, we proposed bindEmbed21, a method predicting whether a prote...

Descripción completa

Detalles Bibliográficos
Autores principales: Littmann, Maria, Heinzinger, Michael, Dallago, Christian, Weissenow, Konstantin, Rost, Burkhard
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8668950/
https://www.ncbi.nlm.nih.gov/pubmed/34903827
http://dx.doi.org/10.1038/s41598-021-03431-4
_version_ 1784614689098956800
author Littmann, Maria
Heinzinger, Michael
Dallago, Christian
Weissenow, Konstantin
Rost, Burkhard
author_facet Littmann, Maria
Heinzinger, Michael
Dallago, Christian
Weissenow, Konstantin
Rost, Burkhard
author_sort Littmann, Maria
collection PubMed
description One important aspect of protein function is the binding of proteins to ligands, including small molecules, metal ions, and macromolecules such as DNA or RNA. Despite decades of experimental progress many binding sites remain obscure. Here, we proposed bindEmbed21, a method predicting whether a protein residue binds to metal ions, nucleic acids, or small molecules. The Artificial Intelligence (AI)-based method exclusively uses embeddings from the Transformer-based protein Language Model (pLM) ProtT5 as input. Using only single sequences without creating multiple sequence alignments (MSAs), bindEmbed21DL outperformed MSA-based predictions. Combination with homology-based inference increased performance to F1 = 48 ± 3% (95% CI) and MCC = 0.46 ± 0.04 when merging all three ligand classes into one. All results were confirmed by three independent data sets. Focusing on very reliably predicted residues could complement experimental evidence: For the 25% most strongly predicted binding residues, at least 73% were correctly predicted even when ignoring the problem of missing experimental annotations. The new method bindEmbed21 is fast, simple, and broadly applicable—neither using structure nor MSAs. Thereby, it found binding residues in over 42% of all human proteins not otherwise implied in binding and predicted about 6% of all residues as binding to metal ions, nucleic acids, or small molecules.
format Online
Article
Text
id pubmed-8668950
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-86689502021-12-15 Protein embeddings and deep learning predict binding residues for various ligand classes Littmann, Maria Heinzinger, Michael Dallago, Christian Weissenow, Konstantin Rost, Burkhard Sci Rep Article One important aspect of protein function is the binding of proteins to ligands, including small molecules, metal ions, and macromolecules such as DNA or RNA. Despite decades of experimental progress many binding sites remain obscure. Here, we proposed bindEmbed21, a method predicting whether a protein residue binds to metal ions, nucleic acids, or small molecules. The Artificial Intelligence (AI)-based method exclusively uses embeddings from the Transformer-based protein Language Model (pLM) ProtT5 as input. Using only single sequences without creating multiple sequence alignments (MSAs), bindEmbed21DL outperformed MSA-based predictions. Combination with homology-based inference increased performance to F1 = 48 ± 3% (95% CI) and MCC = 0.46 ± 0.04 when merging all three ligand classes into one. All results were confirmed by three independent data sets. Focusing on very reliably predicted residues could complement experimental evidence: For the 25% most strongly predicted binding residues, at least 73% were correctly predicted even when ignoring the problem of missing experimental annotations. The new method bindEmbed21 is fast, simple, and broadly applicable—neither using structure nor MSAs. Thereby, it found binding residues in over 42% of all human proteins not otherwise implied in binding and predicted about 6% of all residues as binding to metal ions, nucleic acids, or small molecules. Nature Publishing Group UK 2021-12-13 /pmc/articles/PMC8668950/ /pubmed/34903827 http://dx.doi.org/10.1038/s41598-021-03431-4 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Littmann, Maria
Heinzinger, Michael
Dallago, Christian
Weissenow, Konstantin
Rost, Burkhard
Protein embeddings and deep learning predict binding residues for various ligand classes
title Protein embeddings and deep learning predict binding residues for various ligand classes
title_full Protein embeddings and deep learning predict binding residues for various ligand classes
title_fullStr Protein embeddings and deep learning predict binding residues for various ligand classes
title_full_unstemmed Protein embeddings and deep learning predict binding residues for various ligand classes
title_short Protein embeddings and deep learning predict binding residues for various ligand classes
title_sort protein embeddings and deep learning predict binding residues for various ligand classes
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8668950/
https://www.ncbi.nlm.nih.gov/pubmed/34903827
http://dx.doi.org/10.1038/s41598-021-03431-4
work_keys_str_mv AT littmannmaria proteinembeddingsanddeeplearningpredictbindingresiduesforvariousligandclasses
AT heinzingermichael proteinembeddingsanddeeplearningpredictbindingresiduesforvariousligandclasses
AT dallagochristian proteinembeddingsanddeeplearningpredictbindingresiduesforvariousligandclasses
AT weissenowkonstantin proteinembeddingsanddeeplearningpredictbindingresiduesforvariousligandclasses
AT rostburkhard proteinembeddingsanddeeplearningpredictbindingresiduesforvariousligandclasses