Cargando…

GPCR-PEnDB: a database of protein sequences and derived features to facilitate prediction and classification of G protein-coupled receptors

G protein-coupled receptors (GPCRs) constitute the largest group of membrane receptor proteins in eukaryotes. Due to their significant roles in various physiological processes such as vision, smell and inflammation, GPCRs are the targets of many prescription drugs. However, the functional and sequen...

Descripción completa

Detalles Bibliográficos
Autores principales: Begum, Khodeza, Mohl, Jonathon E, Ayivor, Fredrick, Perez, Eder E, Leung, Ming-Ying
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7678784/
https://www.ncbi.nlm.nih.gov/pubmed/33216895
http://dx.doi.org/10.1093/database/baaa087
_version_ 1783612226113896448
author Begum, Khodeza
Mohl, Jonathon E
Ayivor, Fredrick
Perez, Eder E
Leung, Ming-Ying
author_facet Begum, Khodeza
Mohl, Jonathon E
Ayivor, Fredrick
Perez, Eder E
Leung, Ming-Ying
author_sort Begum, Khodeza
collection PubMed
description G protein-coupled receptors (GPCRs) constitute the largest group of membrane receptor proteins in eukaryotes. Due to their significant roles in various physiological processes such as vision, smell and inflammation, GPCRs are the targets of many prescription drugs. However, the functional and sequence diversity of GPCRs has kept their prediction and classification based on amino acid sequence data as a challenging bioinformatics problem. There are existing computational approaches, mainly using machine learning and statistical methods, to predict and classify GPCRs based on amino acid sequence and sequence derived features. In this paper, we describe a searchable MySQL database, named GPCR-PEnDB (GPCR Prediction Ensemble Database), of confirmed GPCRs and non-GPCRs. It was constructed with the goal of allowing users to conveniently access useful information of GPCRs in a wide range of organisms and to compile reliable training and testing datasets for different combinations of computational tools. This database currently contains 3129 confirmed GPCR and 3575 non-GPCR sequences collected from the UniProtKB/Swiss-Prot protein database, encompassing over 1200 species. The non-GPCR entries include transmembrane proteins for evaluating various prediction programs’ abilities to distinguish GPCRs from other transmembrane proteins. Each protein is linked to information about its source organism, classification, sequence lengths and composition, and other derived sequence features. We present examples of using this database along with its graphical user interface, to query for GPCRs with specific sequence properties and to compare the accuracies of five tools for GPCR prediction. This initial version of GPCR-PEnDB will provide a framework for future extensions to include additional sequence and feature data to facilitate the design and assessment of software tools and experimental studies to help understand the functional roles of GPCRs. Database URL: gpcr.utep.edu/database
format Online
Article
Text
id pubmed-7678784
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-76787842020-11-25 GPCR-PEnDB: a database of protein sequences and derived features to facilitate prediction and classification of G protein-coupled receptors Begum, Khodeza Mohl, Jonathon E Ayivor, Fredrick Perez, Eder E Leung, Ming-Ying Database (Oxford) Original Article G protein-coupled receptors (GPCRs) constitute the largest group of membrane receptor proteins in eukaryotes. Due to their significant roles in various physiological processes such as vision, smell and inflammation, GPCRs are the targets of many prescription drugs. However, the functional and sequence diversity of GPCRs has kept their prediction and classification based on amino acid sequence data as a challenging bioinformatics problem. There are existing computational approaches, mainly using machine learning and statistical methods, to predict and classify GPCRs based on amino acid sequence and sequence derived features. In this paper, we describe a searchable MySQL database, named GPCR-PEnDB (GPCR Prediction Ensemble Database), of confirmed GPCRs and non-GPCRs. It was constructed with the goal of allowing users to conveniently access useful information of GPCRs in a wide range of organisms and to compile reliable training and testing datasets for different combinations of computational tools. This database currently contains 3129 confirmed GPCR and 3575 non-GPCR sequences collected from the UniProtKB/Swiss-Prot protein database, encompassing over 1200 species. The non-GPCR entries include transmembrane proteins for evaluating various prediction programs’ abilities to distinguish GPCRs from other transmembrane proteins. Each protein is linked to information about its source organism, classification, sequence lengths and composition, and other derived sequence features. We present examples of using this database along with its graphical user interface, to query for GPCRs with specific sequence properties and to compare the accuracies of five tools for GPCR prediction. This initial version of GPCR-PEnDB will provide a framework for future extensions to include additional sequence and feature data to facilitate the design and assessment of software tools and experimental studies to help understand the functional roles of GPCRs. Database URL: gpcr.utep.edu/database Oxford University Press 2020-11-20 /pmc/articles/PMC7678784/ /pubmed/33216895 http://dx.doi.org/10.1093/database/baaa087 Text en © The Author(s) 2020. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Article
Begum, Khodeza
Mohl, Jonathon E
Ayivor, Fredrick
Perez, Eder E
Leung, Ming-Ying
GPCR-PEnDB: a database of protein sequences and derived features to facilitate prediction and classification of G protein-coupled receptors
title GPCR-PEnDB: a database of protein sequences and derived features to facilitate prediction and classification of G protein-coupled receptors
title_full GPCR-PEnDB: a database of protein sequences and derived features to facilitate prediction and classification of G protein-coupled receptors
title_fullStr GPCR-PEnDB: a database of protein sequences and derived features to facilitate prediction and classification of G protein-coupled receptors
title_full_unstemmed GPCR-PEnDB: a database of protein sequences and derived features to facilitate prediction and classification of G protein-coupled receptors
title_short GPCR-PEnDB: a database of protein sequences and derived features to facilitate prediction and classification of G protein-coupled receptors
title_sort gpcr-pendb: a database of protein sequences and derived features to facilitate prediction and classification of g protein-coupled receptors
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7678784/
https://www.ncbi.nlm.nih.gov/pubmed/33216895
http://dx.doi.org/10.1093/database/baaa087
work_keys_str_mv AT begumkhodeza gpcrpendbadatabaseofproteinsequencesandderivedfeaturestofacilitatepredictionandclassificationofgproteincoupledreceptors
AT mohljonathone gpcrpendbadatabaseofproteinsequencesandderivedfeaturestofacilitatepredictionandclassificationofgproteincoupledreceptors
AT ayivorfredrick gpcrpendbadatabaseofproteinsequencesandderivedfeaturestofacilitatepredictionandclassificationofgproteincoupledreceptors
AT perezedere gpcrpendbadatabaseofproteinsequencesandderivedfeaturestofacilitatepredictionandclassificationofgproteincoupledreceptors
AT leungmingying gpcrpendbadatabaseofproteinsequencesandderivedfeaturestofacilitatepredictionandclassificationofgproteincoupledreceptors