Cargando…

MSA-Regularized Protein Sequence Transformer toward Predicting Genome-Wide Chemical-Protein Interactions: Application to GPCRome Deorphanization

[Image: see text] Small molecules play a critical role in modulating biological systems. Knowledge of chemical–protein interactions helps address fundamental and practical questions in biology and medicine. However, with the rapid emergence of newly sequenced genes, the endogenous or surrogate ligan...

Descripción completa

Detalles Bibliográficos
Autores principales:	Cai, Tian, Lim, Hansaim, Abbu, Kyra Alyssa, Qiu, Yue, Nussinov, Ruth, Xie, Lei
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	American Chemical Society 2021
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8154251/ https://www.ncbi.nlm.nih.gov/pubmed/33757283 http://dx.doi.org/10.1021/acs.jcim.0c01285

_version_	1783698971039891456
author	Cai, Tian Lim, Hansaim Abbu, Kyra Alyssa Qiu, Yue Nussinov, Ruth Xie, Lei
author_facet	Cai, Tian Lim, Hansaim Abbu, Kyra Alyssa Qiu, Yue Nussinov, Ruth Xie, Lei
author_sort	Cai, Tian
collection	PubMed
description	[Image: see text] Small molecules play a critical role in modulating biological systems. Knowledge of chemical–protein interactions helps address fundamental and practical questions in biology and medicine. However, with the rapid emergence of newly sequenced genes, the endogenous or surrogate ligands of a vast number of proteins remain unknown. Homology modeling and machine learning are two major methods for assigning new ligands to a protein but mostly fail when sequence homology between an unannotated protein and those with known functions or structures is low. In this study, we develop a new deep learning framework to predict chemical binding to evolutionary divergent unannotated proteins, whose ligand cannot be reliably predicted by existing methods. By incorporating evolutionary information into self-supervised learning of unlabeled protein sequences, we develop a novel method, distilled sequence alignment embedding (DISAE), for the protein sequence representation. DISAE can utilize all protein sequences and their multiple sequence alignment (MSA) to capture functional relationships between proteins without the knowledge of their structure and function. Followed by the DISAE pretraining, we devise a module-based fine-tuning strategy for the supervised learning of chemical–protein interactions. In the benchmark studies, DISAE significantly improves the generalizability of machine learning models and outperforms the state-of-the-art methods by a large margin. Comprehensive ablation studies suggest that the use of MSA, sequence distillation, and triplet pretraining critically contributes to the success of DISAE. The interpretability analysis of DISAE suggests that it learns biologically meaningful information. We further use DISAE to assign ligands to human orphan G-protein coupled receptors (GPCRs) and to cluster the human GPCRome by integrating their phylogenetic and ligand relationships. The promising results of DISAE open an avenue for exploring the chemical landscape of entire sequenced genomes.
format	Online Article Text
id	pubmed-8154251
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	American Chemical Society
record_format	MEDLINE/PubMed
spelling	pubmed-81542512021-05-27 MSA-Regularized Protein Sequence Transformer toward Predicting Genome-Wide Chemical-Protein Interactions: Application to GPCRome Deorphanization Cai, Tian Lim, Hansaim Abbu, Kyra Alyssa Qiu, Yue Nussinov, Ruth Xie, Lei J Chem Inf Model [Image: see text] Small molecules play a critical role in modulating biological systems. Knowledge of chemical–protein interactions helps address fundamental and practical questions in biology and medicine. However, with the rapid emergence of newly sequenced genes, the endogenous or surrogate ligands of a vast number of proteins remain unknown. Homology modeling and machine learning are two major methods for assigning new ligands to a protein but mostly fail when sequence homology between an unannotated protein and those with known functions or structures is low. In this study, we develop a new deep learning framework to predict chemical binding to evolutionary divergent unannotated proteins, whose ligand cannot be reliably predicted by existing methods. By incorporating evolutionary information into self-supervised learning of unlabeled protein sequences, we develop a novel method, distilled sequence alignment embedding (DISAE), for the protein sequence representation. DISAE can utilize all protein sequences and their multiple sequence alignment (MSA) to capture functional relationships between proteins without the knowledge of their structure and function. Followed by the DISAE pretraining, we devise a module-based fine-tuning strategy for the supervised learning of chemical–protein interactions. In the benchmark studies, DISAE significantly improves the generalizability of machine learning models and outperforms the state-of-the-art methods by a large margin. Comprehensive ablation studies suggest that the use of MSA, sequence distillation, and triplet pretraining critically contributes to the success of DISAE. The interpretability analysis of DISAE suggests that it learns biologically meaningful information. We further use DISAE to assign ligands to human orphan G-protein coupled receptors (GPCRs) and to cluster the human GPCRome by integrating their phylogenetic and ligand relationships. The promising results of DISAE open an avenue for exploring the chemical landscape of entire sequenced genomes. American Chemical Society 2021-03-23 2021-04-26 /pmc/articles/PMC8154251/ /pubmed/33757283 http://dx.doi.org/10.1021/acs.jcim.0c01285 Text en © 2021 The Authors. Published by American Chemical Society Permits non-commercial access and re-use, provided that author attribution and integrity are maintained; but does not permit creation of adaptations or other derivative works (https://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle	Cai, Tian Lim, Hansaim Abbu, Kyra Alyssa Qiu, Yue Nussinov, Ruth Xie, Lei MSA-Regularized Protein Sequence Transformer toward Predicting Genome-Wide Chemical-Protein Interactions: Application to GPCRome Deorphanization
title	MSA-Regularized Protein Sequence Transformer toward Predicting Genome-Wide Chemical-Protein Interactions: Application to GPCRome Deorphanization
title_full	MSA-Regularized Protein Sequence Transformer toward Predicting Genome-Wide Chemical-Protein Interactions: Application to GPCRome Deorphanization
title_fullStr	MSA-Regularized Protein Sequence Transformer toward Predicting Genome-Wide Chemical-Protein Interactions: Application to GPCRome Deorphanization
title_full_unstemmed	MSA-Regularized Protein Sequence Transformer toward Predicting Genome-Wide Chemical-Protein Interactions: Application to GPCRome Deorphanization
title_short	MSA-Regularized Protein Sequence Transformer toward Predicting Genome-Wide Chemical-Protein Interactions: Application to GPCRome Deorphanization
title_sort	msa-regularized protein sequence transformer toward predicting genome-wide chemical-protein interactions: application to gpcrome deorphanization
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8154251/ https://www.ncbi.nlm.nih.gov/pubmed/33757283 http://dx.doi.org/10.1021/acs.jcim.0c01285
work_keys_str_mv	AT caitian msaregularizedproteinsequencetransformertowardpredictinggenomewidechemicalproteininteractionsapplicationtogpcromedeorphanization AT limhansaim msaregularizedproteinsequencetransformertowardpredictinggenomewidechemicalproteininteractionsapplicationtogpcromedeorphanization AT abbukyraalyssa msaregularizedproteinsequencetransformertowardpredictinggenomewidechemicalproteininteractionsapplicationtogpcromedeorphanization AT qiuyue msaregularizedproteinsequencetransformertowardpredictinggenomewidechemicalproteininteractionsapplicationtogpcromedeorphanization AT nussinovruth msaregularizedproteinsequencetransformertowardpredictinggenomewidechemicalproteininteractionsapplicationtogpcromedeorphanization AT xielei msaregularizedproteinsequencetransformertowardpredictinggenomewidechemicalproteininteractionsapplicationtogpcromedeorphanization

MSA-Regularized Protein Sequence Transformer toward Predicting Genome-Wide Chemical-Protein Interactions: Application to GPCRome Deorphanization

Ejemplares similares