Cargando…

EMCBOW-GPCR: A method for identifying G-protein coupled receptors based on word embedding and wordbooks

G Protein-Coupled Receptors (GPCRs) are one of the largest membrane protein receptor family in human, which are also important targets for many drugs. Thence, it’s of great significance to judge whether a protein is a GPCR or not. However, identifying GPCRs by experimental methods is very expensive...

Descripción completa

Detalles Bibliográficos
Autores principales: Qiu, Wangren, Lv, Zhe, Xiao, Xuan, Shao, Shuai, Lin, Hao
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Research Network of Computational and Structural Biotechnology 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8437786/
https://www.ncbi.nlm.nih.gov/pubmed/34527200
http://dx.doi.org/10.1016/j.csbj.2021.08.044
_version_ 1783752228450861056
author Qiu, Wangren
Lv, Zhe
Xiao, Xuan
Shao, Shuai
Lin, Hao
author_facet Qiu, Wangren
Lv, Zhe
Xiao, Xuan
Shao, Shuai
Lin, Hao
author_sort Qiu, Wangren
collection PubMed
description G Protein-Coupled Receptors (GPCRs) are one of the largest membrane protein receptor family in human, which are also important targets for many drugs. Thence, it’s of great significance to judge whether a protein is a GPCR or not. However, identifying GPCRs by experimental methods is very expensive and time-consuming. As more and more GPCR primary sequences are accumulated, it’s feasible to develop a computational model to predict GPCRs precisely and quickly. In this paper, a novel method called EMCBOW-GPCR has been proposed to improve the accuracy of identifying GPCRs based on natural language processing (NLP). For representing GPCRs, three word-embedding models and a bag-of-words model are used to extract original features. Then, the original features are thrown into a Deep-learning algorithm to extract features further and reduce the dimension. Finally, the obtained features are fed into Extreme Gradient Boosting. As shown with the results comparison, the overall prediction metrics of EMCBOW-GPCR are higher than the state of the arts. In order to be convenient for more researchers to use EMCBOW-GPCR, the method and source code have been opened in github, which are available at https://github.com/454170054/EMCBOW-GPCR, and a user-friendly web-server for EMCBOW-GPCR has been established at http://www.jci-bioinfo.cn/emcbowgpcr.
format Online
Article
Text
id pubmed-8437786
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Research Network of Computational and Structural Biotechnology
record_format MEDLINE/PubMed
spelling pubmed-84377862021-09-14 EMCBOW-GPCR: A method for identifying G-protein coupled receptors based on word embedding and wordbooks Qiu, Wangren Lv, Zhe Xiao, Xuan Shao, Shuai Lin, Hao Comput Struct Biotechnol J Research Article G Protein-Coupled Receptors (GPCRs) are one of the largest membrane protein receptor family in human, which are also important targets for many drugs. Thence, it’s of great significance to judge whether a protein is a GPCR or not. However, identifying GPCRs by experimental methods is very expensive and time-consuming. As more and more GPCR primary sequences are accumulated, it’s feasible to develop a computational model to predict GPCRs precisely and quickly. In this paper, a novel method called EMCBOW-GPCR has been proposed to improve the accuracy of identifying GPCRs based on natural language processing (NLP). For representing GPCRs, three word-embedding models and a bag-of-words model are used to extract original features. Then, the original features are thrown into a Deep-learning algorithm to extract features further and reduce the dimension. Finally, the obtained features are fed into Extreme Gradient Boosting. As shown with the results comparison, the overall prediction metrics of EMCBOW-GPCR are higher than the state of the arts. In order to be convenient for more researchers to use EMCBOW-GPCR, the method and source code have been opened in github, which are available at https://github.com/454170054/EMCBOW-GPCR, and a user-friendly web-server for EMCBOW-GPCR has been established at http://www.jci-bioinfo.cn/emcbowgpcr. Research Network of Computational and Structural Biotechnology 2021-08-31 /pmc/articles/PMC8437786/ /pubmed/34527200 http://dx.doi.org/10.1016/j.csbj.2021.08.044 Text en © 2021 The Author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Research Article
Qiu, Wangren
Lv, Zhe
Xiao, Xuan
Shao, Shuai
Lin, Hao
EMCBOW-GPCR: A method for identifying G-protein coupled receptors based on word embedding and wordbooks
title EMCBOW-GPCR: A method for identifying G-protein coupled receptors based on word embedding and wordbooks
title_full EMCBOW-GPCR: A method for identifying G-protein coupled receptors based on word embedding and wordbooks
title_fullStr EMCBOW-GPCR: A method for identifying G-protein coupled receptors based on word embedding and wordbooks
title_full_unstemmed EMCBOW-GPCR: A method for identifying G-protein coupled receptors based on word embedding and wordbooks
title_short EMCBOW-GPCR: A method for identifying G-protein coupled receptors based on word embedding and wordbooks
title_sort emcbow-gpcr: a method for identifying g-protein coupled receptors based on word embedding and wordbooks
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8437786/
https://www.ncbi.nlm.nih.gov/pubmed/34527200
http://dx.doi.org/10.1016/j.csbj.2021.08.044
work_keys_str_mv AT qiuwangren emcbowgpcramethodforidentifyinggproteincoupledreceptorsbasedonwordembeddingandwordbooks
AT lvzhe emcbowgpcramethodforidentifyinggproteincoupledreceptorsbasedonwordembeddingandwordbooks
AT xiaoxuan emcbowgpcramethodforidentifyinggproteincoupledreceptorsbasedonwordembeddingandwordbooks
AT shaoshuai emcbowgpcramethodforidentifyinggproteincoupledreceptorsbasedonwordembeddingandwordbooks
AT linhao emcbowgpcramethodforidentifyinggproteincoupledreceptorsbasedonwordembeddingandwordbooks