Cargando…
Protein embeddings and deep learning predict binding residues for various ligand classes
One important aspect of protein function is the binding of proteins to ligands, including small molecules, metal ions, and macromolecules such as DNA or RNA. Despite decades of experimental progress many binding sites remain obscure. Here, we proposed bindEmbed21, a method predicting whether a prote...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8668950/ https://www.ncbi.nlm.nih.gov/pubmed/34903827 http://dx.doi.org/10.1038/s41598-021-03431-4 |
_version_ | 1784614689098956800 |
---|---|
author | Littmann, Maria Heinzinger, Michael Dallago, Christian Weissenow, Konstantin Rost, Burkhard |
author_facet | Littmann, Maria Heinzinger, Michael Dallago, Christian Weissenow, Konstantin Rost, Burkhard |
author_sort | Littmann, Maria |
collection | PubMed |
description | One important aspect of protein function is the binding of proteins to ligands, including small molecules, metal ions, and macromolecules such as DNA or RNA. Despite decades of experimental progress many binding sites remain obscure. Here, we proposed bindEmbed21, a method predicting whether a protein residue binds to metal ions, nucleic acids, or small molecules. The Artificial Intelligence (AI)-based method exclusively uses embeddings from the Transformer-based protein Language Model (pLM) ProtT5 as input. Using only single sequences without creating multiple sequence alignments (MSAs), bindEmbed21DL outperformed MSA-based predictions. Combination with homology-based inference increased performance to F1 = 48 ± 3% (95% CI) and MCC = 0.46 ± 0.04 when merging all three ligand classes into one. All results were confirmed by three independent data sets. Focusing on very reliably predicted residues could complement experimental evidence: For the 25% most strongly predicted binding residues, at least 73% were correctly predicted even when ignoring the problem of missing experimental annotations. The new method bindEmbed21 is fast, simple, and broadly applicable—neither using structure nor MSAs. Thereby, it found binding residues in over 42% of all human proteins not otherwise implied in binding and predicted about 6% of all residues as binding to metal ions, nucleic acids, or small molecules. |
format | Online Article Text |
id | pubmed-8668950 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-86689502021-12-15 Protein embeddings and deep learning predict binding residues for various ligand classes Littmann, Maria Heinzinger, Michael Dallago, Christian Weissenow, Konstantin Rost, Burkhard Sci Rep Article One important aspect of protein function is the binding of proteins to ligands, including small molecules, metal ions, and macromolecules such as DNA or RNA. Despite decades of experimental progress many binding sites remain obscure. Here, we proposed bindEmbed21, a method predicting whether a protein residue binds to metal ions, nucleic acids, or small molecules. The Artificial Intelligence (AI)-based method exclusively uses embeddings from the Transformer-based protein Language Model (pLM) ProtT5 as input. Using only single sequences without creating multiple sequence alignments (MSAs), bindEmbed21DL outperformed MSA-based predictions. Combination with homology-based inference increased performance to F1 = 48 ± 3% (95% CI) and MCC = 0.46 ± 0.04 when merging all three ligand classes into one. All results were confirmed by three independent data sets. Focusing on very reliably predicted residues could complement experimental evidence: For the 25% most strongly predicted binding residues, at least 73% were correctly predicted even when ignoring the problem of missing experimental annotations. The new method bindEmbed21 is fast, simple, and broadly applicable—neither using structure nor MSAs. Thereby, it found binding residues in over 42% of all human proteins not otherwise implied in binding and predicted about 6% of all residues as binding to metal ions, nucleic acids, or small molecules. Nature Publishing Group UK 2021-12-13 /pmc/articles/PMC8668950/ /pubmed/34903827 http://dx.doi.org/10.1038/s41598-021-03431-4 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Littmann, Maria Heinzinger, Michael Dallago, Christian Weissenow, Konstantin Rost, Burkhard Protein embeddings and deep learning predict binding residues for various ligand classes |
title | Protein embeddings and deep learning predict binding residues for various ligand classes |
title_full | Protein embeddings and deep learning predict binding residues for various ligand classes |
title_fullStr | Protein embeddings and deep learning predict binding residues for various ligand classes |
title_full_unstemmed | Protein embeddings and deep learning predict binding residues for various ligand classes |
title_short | Protein embeddings and deep learning predict binding residues for various ligand classes |
title_sort | protein embeddings and deep learning predict binding residues for various ligand classes |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8668950/ https://www.ncbi.nlm.nih.gov/pubmed/34903827 http://dx.doi.org/10.1038/s41598-021-03431-4 |
work_keys_str_mv | AT littmannmaria proteinembeddingsanddeeplearningpredictbindingresiduesforvariousligandclasses AT heinzingermichael proteinembeddingsanddeeplearningpredictbindingresiduesforvariousligandclasses AT dallagochristian proteinembeddingsanddeeplearningpredictbindingresiduesforvariousligandclasses AT weissenowkonstantin proteinembeddingsanddeeplearningpredictbindingresiduesforvariousligandclasses AT rostburkhard proteinembeddingsanddeeplearningpredictbindingresiduesforvariousligandclasses |