Cargando…

Motif2Mol: Prediction of New Active Compounds Based on Sequence Motifs of Ligand Binding Sites in Proteins Using a Biochemical Language Model

In drug design, the prediction of new active compounds from protein sequence data has only been attempted in a few studies thus far. This prediction task is principally challenging because global protein sequence similarity has strong evolutional and structural implications, but is often only vaguel...

Descripción completa

Detalles Bibliográficos
Autores principales: Yoshimori, Atsushi, Bajorath, Jürgen
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10216779/
https://www.ncbi.nlm.nih.gov/pubmed/37238703
http://dx.doi.org/10.3390/biom13050833
_version_ 1785048379797012480
author Yoshimori, Atsushi
Bajorath, Jürgen
author_facet Yoshimori, Atsushi
Bajorath, Jürgen
author_sort Yoshimori, Atsushi
collection PubMed
description In drug design, the prediction of new active compounds from protein sequence data has only been attempted in a few studies thus far. This prediction task is principally challenging because global protein sequence similarity has strong evolutional and structural implications, but is often only vaguely related to ligand binding. Deep language models adapted from natural language processing offer new opportunities to attempt such predictions via machine translation by directly relating amino acid sequences and chemical structures to each based on textual molecular representations. Herein, we introduce a biochemical language model with transformer architecture for the prediction of new active compounds from sequence motifs of ligand binding sites. In a proof-of-concept application on inhibitors of more than 200 human kinases, the Motif2Mol model revealed promising learning characteristics and an unprecedented ability to consistently reproduce known inhibitors of different kinases.
format Online
Article
Text
id pubmed-10216779
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-102167792023-05-27 Motif2Mol: Prediction of New Active Compounds Based on Sequence Motifs of Ligand Binding Sites in Proteins Using a Biochemical Language Model Yoshimori, Atsushi Bajorath, Jürgen Biomolecules Article In drug design, the prediction of new active compounds from protein sequence data has only been attempted in a few studies thus far. This prediction task is principally challenging because global protein sequence similarity has strong evolutional and structural implications, but is often only vaguely related to ligand binding. Deep language models adapted from natural language processing offer new opportunities to attempt such predictions via machine translation by directly relating amino acid sequences and chemical structures to each based on textual molecular representations. Herein, we introduce a biochemical language model with transformer architecture for the prediction of new active compounds from sequence motifs of ligand binding sites. In a proof-of-concept application on inhibitors of more than 200 human kinases, the Motif2Mol model revealed promising learning characteristics and an unprecedented ability to consistently reproduce known inhibitors of different kinases. MDPI 2023-05-13 /pmc/articles/PMC10216779/ /pubmed/37238703 http://dx.doi.org/10.3390/biom13050833 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Yoshimori, Atsushi
Bajorath, Jürgen
Motif2Mol: Prediction of New Active Compounds Based on Sequence Motifs of Ligand Binding Sites in Proteins Using a Biochemical Language Model
title Motif2Mol: Prediction of New Active Compounds Based on Sequence Motifs of Ligand Binding Sites in Proteins Using a Biochemical Language Model
title_full Motif2Mol: Prediction of New Active Compounds Based on Sequence Motifs of Ligand Binding Sites in Proteins Using a Biochemical Language Model
title_fullStr Motif2Mol: Prediction of New Active Compounds Based on Sequence Motifs of Ligand Binding Sites in Proteins Using a Biochemical Language Model
title_full_unstemmed Motif2Mol: Prediction of New Active Compounds Based on Sequence Motifs of Ligand Binding Sites in Proteins Using a Biochemical Language Model
title_short Motif2Mol: Prediction of New Active Compounds Based on Sequence Motifs of Ligand Binding Sites in Proteins Using a Biochemical Language Model
title_sort motif2mol: prediction of new active compounds based on sequence motifs of ligand binding sites in proteins using a biochemical language model
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10216779/
https://www.ncbi.nlm.nih.gov/pubmed/37238703
http://dx.doi.org/10.3390/biom13050833
work_keys_str_mv AT yoshimoriatsushi motif2molpredictionofnewactivecompoundsbasedonsequencemotifsofligandbindingsitesinproteinsusingabiochemicallanguagemodel
AT bajorathjurgen motif2molpredictionofnewactivecompoundsbasedonsequencemotifsofligandbindingsitesinproteinsusingabiochemicallanguagemodel