Cargando…

ECPred: a tool for the prediction of the enzymatic functions of protein sequences based on the EC nomenclature

BACKGROUND: The automated prediction of the enzymatic functions of uncharacterized proteins is a crucial topic in bioinformatics. Although several methods and tools have been proposed to classify enzymes, most of these studies are limited to specific functional classes and levels of the Enzyme Commi...

Descripción completa

Detalles Bibliográficos
Autores principales: Dalkiran, Alperen, Rifaioglu, Ahmet Sureyya, Martin, Maria Jesus, Cetin-Atalay, Rengul, Atalay, Volkan, Doğan, Tunca
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6150975/
https://www.ncbi.nlm.nih.gov/pubmed/30241466
http://dx.doi.org/10.1186/s12859-018-2368-y
_version_ 1783357076174536704
author Dalkiran, Alperen
Rifaioglu, Ahmet Sureyya
Martin, Maria Jesus
Cetin-Atalay, Rengul
Atalay, Volkan
Doğan, Tunca
author_facet Dalkiran, Alperen
Rifaioglu, Ahmet Sureyya
Martin, Maria Jesus
Cetin-Atalay, Rengul
Atalay, Volkan
Doğan, Tunca
author_sort Dalkiran, Alperen
collection PubMed
description BACKGROUND: The automated prediction of the enzymatic functions of uncharacterized proteins is a crucial topic in bioinformatics. Although several methods and tools have been proposed to classify enzymes, most of these studies are limited to specific functional classes and levels of the Enzyme Commission (EC) number hierarchy. Besides, most of the previous methods incorporated only a single input feature type, which limits the applicability to the wide functional space. Here, we proposed a novel enzymatic function prediction tool, ECPred, based on ensemble of machine learning classifiers. RESULTS: In ECPred, each EC number constituted an individual class and therefore, had an independent learning model. Enzyme vs. non-enzyme classification is incorporated into ECPred along with a hierarchical prediction approach exploiting the tree structure of the EC nomenclature. ECPred provides predictions for 858 EC numbers in total including 6 main classes, 55 subclass classes, 163 sub-subclass classes and 634 substrate classes. The proposed method is tested and compared with the state-of-the-art enzyme function prediction tools by using independent temporal hold-out and no-Pfam datasets constructed during this study. CONCLUSIONS: ECPred is presented both as a stand-alone and a web based tool to provide probabilistic enzymatic function predictions (at all five levels of EC) for uncharacterized protein sequences. Also, the datasets of this study will be a valuable resource for future benchmarking studies. ECPred is available for download, together with all of the datasets used in this study, at: https://github.com/cansyl/ECPred. ECPred webserver can be accessed through http://cansyl.metu.edu.tr/ECPred.html.
format Online
Article
Text
id pubmed-6150975
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-61509752018-09-26 ECPred: a tool for the prediction of the enzymatic functions of protein sequences based on the EC nomenclature Dalkiran, Alperen Rifaioglu, Ahmet Sureyya Martin, Maria Jesus Cetin-Atalay, Rengul Atalay, Volkan Doğan, Tunca BMC Bioinformatics Software BACKGROUND: The automated prediction of the enzymatic functions of uncharacterized proteins is a crucial topic in bioinformatics. Although several methods and tools have been proposed to classify enzymes, most of these studies are limited to specific functional classes and levels of the Enzyme Commission (EC) number hierarchy. Besides, most of the previous methods incorporated only a single input feature type, which limits the applicability to the wide functional space. Here, we proposed a novel enzymatic function prediction tool, ECPred, based on ensemble of machine learning classifiers. RESULTS: In ECPred, each EC number constituted an individual class and therefore, had an independent learning model. Enzyme vs. non-enzyme classification is incorporated into ECPred along with a hierarchical prediction approach exploiting the tree structure of the EC nomenclature. ECPred provides predictions for 858 EC numbers in total including 6 main classes, 55 subclass classes, 163 sub-subclass classes and 634 substrate classes. The proposed method is tested and compared with the state-of-the-art enzyme function prediction tools by using independent temporal hold-out and no-Pfam datasets constructed during this study. CONCLUSIONS: ECPred is presented both as a stand-alone and a web based tool to provide probabilistic enzymatic function predictions (at all five levels of EC) for uncharacterized protein sequences. Also, the datasets of this study will be a valuable resource for future benchmarking studies. ECPred is available for download, together with all of the datasets used in this study, at: https://github.com/cansyl/ECPred. ECPred webserver can be accessed through http://cansyl.metu.edu.tr/ECPred.html. BioMed Central 2018-09-21 /pmc/articles/PMC6150975/ /pubmed/30241466 http://dx.doi.org/10.1186/s12859-018-2368-y Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Dalkiran, Alperen
Rifaioglu, Ahmet Sureyya
Martin, Maria Jesus
Cetin-Atalay, Rengul
Atalay, Volkan
Doğan, Tunca
ECPred: a tool for the prediction of the enzymatic functions of protein sequences based on the EC nomenclature
title ECPred: a tool for the prediction of the enzymatic functions of protein sequences based on the EC nomenclature
title_full ECPred: a tool for the prediction of the enzymatic functions of protein sequences based on the EC nomenclature
title_fullStr ECPred: a tool for the prediction of the enzymatic functions of protein sequences based on the EC nomenclature
title_full_unstemmed ECPred: a tool for the prediction of the enzymatic functions of protein sequences based on the EC nomenclature
title_short ECPred: a tool for the prediction of the enzymatic functions of protein sequences based on the EC nomenclature
title_sort ecpred: a tool for the prediction of the enzymatic functions of protein sequences based on the ec nomenclature
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6150975/
https://www.ncbi.nlm.nih.gov/pubmed/30241466
http://dx.doi.org/10.1186/s12859-018-2368-y
work_keys_str_mv AT dalkiranalperen ecpredatoolforthepredictionoftheenzymaticfunctionsofproteinsequencesbasedontheecnomenclature
AT rifaiogluahmetsureyya ecpredatoolforthepredictionoftheenzymaticfunctionsofproteinsequencesbasedontheecnomenclature
AT martinmariajesus ecpredatoolforthepredictionoftheenzymaticfunctionsofproteinsequencesbasedontheecnomenclature
AT cetinatalayrengul ecpredatoolforthepredictionoftheenzymaticfunctionsofproteinsequencesbasedontheecnomenclature
AT atalayvolkan ecpredatoolforthepredictionoftheenzymaticfunctionsofproteinsequencesbasedontheecnomenclature
AT dogantunca ecpredatoolforthepredictionoftheenzymaticfunctionsofproteinsequencesbasedontheecnomenclature