Cargando…

Identifying GPCR-drug interaction based on wordbook learning from sequences

BACKGROUND: G protein-coupled receptors (GPCRs) mediate a variety of important physiological functions, are closely related to many diseases, and constitute the most important target family of modern drugs. Therefore, the research of GPCR analysis and GPCR ligand screening is the hotspot of new drug...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Pu, Huang, Xiaotong, Qiu, Wangren, Xiao, Xuan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7171867/
https://www.ncbi.nlm.nih.gov/pubmed/32312232
http://dx.doi.org/10.1186/s12859-020-3488-8
_version_ 1783524154096484352
author Wang, Pu
Huang, Xiaotong
Qiu, Wangren
Xiao, Xuan
author_facet Wang, Pu
Huang, Xiaotong
Qiu, Wangren
Xiao, Xuan
author_sort Wang, Pu
collection PubMed
description BACKGROUND: G protein-coupled receptors (GPCRs) mediate a variety of important physiological functions, are closely related to many diseases, and constitute the most important target family of modern drugs. Therefore, the research of GPCR analysis and GPCR ligand screening is the hotspot of new drug development. Accurately identifying the GPCR-drug interaction is one of the key steps for designing GPCR-targeted drugs. However, it is prohibitively expensive to experimentally ascertain the interaction of GPCR-drug pairs on a large scale. Therefore, it is of great significance to predict the interaction of GPCR-drug pairs directly from the molecular sequences. With the accumulation of known GPCR-drug interaction data, it is feasible to develop sequence-based machine learning models for query GPCR-drug pairs. RESULTS: In this paper, a new sequence-based method is proposed to identify GPCR-drug interactions. For GPCRs, we use a novel bag-of-words (BoW) model to extract sequence features, which can extract more pattern information from low-order to high-order and limit the feature space dimension. For drug molecules, we use discrete Fourier transform (DFT) to extract higher-order pattern information from the original molecular fingerprints. The feature vectors of two kinds of molecules are concatenated and input into a simple prediction engine distance-weighted K-nearest-neighbor (DWKNN). This basic method is easy to be enhanced through ensemble learning. Through testing on recently constructed GPCR-drug interaction datasets, it is found that the proposed methods are better than the existing sequence-based machine learning methods in generalization ability, even an unconventional method in which the prediction performance was further improved by post-processing procedure (PPP). CONCLUSIONS: The proposed methods are effective for GPCR-drug interaction prediction, and may also be potential methods for other target-drug interaction prediction, or protein-protein interaction prediction. In addition, the new proposed feature extraction method for GPCR sequences is the modified version of the traditional BoW model and may be useful to solve problems of protein classification or attribute prediction. The source code of the proposed methods is freely available for academic research at https://github.com/wp3751/GPCR-Drug-Interaction.
format Online
Article
Text
id pubmed-7171867
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-71718672020-04-24 Identifying GPCR-drug interaction based on wordbook learning from sequences Wang, Pu Huang, Xiaotong Qiu, Wangren Xiao, Xuan BMC Bioinformatics Methodology Article BACKGROUND: G protein-coupled receptors (GPCRs) mediate a variety of important physiological functions, are closely related to many diseases, and constitute the most important target family of modern drugs. Therefore, the research of GPCR analysis and GPCR ligand screening is the hotspot of new drug development. Accurately identifying the GPCR-drug interaction is one of the key steps for designing GPCR-targeted drugs. However, it is prohibitively expensive to experimentally ascertain the interaction of GPCR-drug pairs on a large scale. Therefore, it is of great significance to predict the interaction of GPCR-drug pairs directly from the molecular sequences. With the accumulation of known GPCR-drug interaction data, it is feasible to develop sequence-based machine learning models for query GPCR-drug pairs. RESULTS: In this paper, a new sequence-based method is proposed to identify GPCR-drug interactions. For GPCRs, we use a novel bag-of-words (BoW) model to extract sequence features, which can extract more pattern information from low-order to high-order and limit the feature space dimension. For drug molecules, we use discrete Fourier transform (DFT) to extract higher-order pattern information from the original molecular fingerprints. The feature vectors of two kinds of molecules are concatenated and input into a simple prediction engine distance-weighted K-nearest-neighbor (DWKNN). This basic method is easy to be enhanced through ensemble learning. Through testing on recently constructed GPCR-drug interaction datasets, it is found that the proposed methods are better than the existing sequence-based machine learning methods in generalization ability, even an unconventional method in which the prediction performance was further improved by post-processing procedure (PPP). CONCLUSIONS: The proposed methods are effective for GPCR-drug interaction prediction, and may also be potential methods for other target-drug interaction prediction, or protein-protein interaction prediction. In addition, the new proposed feature extraction method for GPCR sequences is the modified version of the traditional BoW model and may be useful to solve problems of protein classification or attribute prediction. The source code of the proposed methods is freely available for academic research at https://github.com/wp3751/GPCR-Drug-Interaction. BioMed Central 2020-04-20 /pmc/articles/PMC7171867/ /pubmed/32312232 http://dx.doi.org/10.1186/s12859-020-3488-8 Text en © The Author(s). 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Methodology Article
Wang, Pu
Huang, Xiaotong
Qiu, Wangren
Xiao, Xuan
Identifying GPCR-drug interaction based on wordbook learning from sequences
title Identifying GPCR-drug interaction based on wordbook learning from sequences
title_full Identifying GPCR-drug interaction based on wordbook learning from sequences
title_fullStr Identifying GPCR-drug interaction based on wordbook learning from sequences
title_full_unstemmed Identifying GPCR-drug interaction based on wordbook learning from sequences
title_short Identifying GPCR-drug interaction based on wordbook learning from sequences
title_sort identifying gpcr-drug interaction based on wordbook learning from sequences
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7171867/
https://www.ncbi.nlm.nih.gov/pubmed/32312232
http://dx.doi.org/10.1186/s12859-020-3488-8
work_keys_str_mv AT wangpu identifyinggpcrdruginteractionbasedonwordbooklearningfromsequences
AT huangxiaotong identifyinggpcrdruginteractionbasedonwordbooklearningfromsequences
AT qiuwangren identifyinggpcrdruginteractionbasedonwordbooklearningfromsequences
AT xiaoxuan identifyinggpcrdruginteractionbasedonwordbooklearningfromsequences