Cargando…

Automatic single- and multi-label enzymatic function prediction by machine learning

The number of protein structures in the PDB database has been increasing more than 15-fold since 1999. The creation of computational models predicting enzymatic function is of major importance since such models provide the means to better understand the behavior of newly discovered enzymes when cata...

Descripción completa

Detalles Bibliográficos
Autores principales: Amidi, Shervine, Amidi, Afshine, Vlachakis, Dimitrios, Paragios, Nikos, Zacharaki, Evangelia I.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5374972/
https://www.ncbi.nlm.nih.gov/pubmed/28367366
http://dx.doi.org/10.7717/peerj.3095
_version_ 1782518976784367616
author Amidi, Shervine
Amidi, Afshine
Vlachakis, Dimitrios
Paragios, Nikos
Zacharaki, Evangelia I.
author_facet Amidi, Shervine
Amidi, Afshine
Vlachakis, Dimitrios
Paragios, Nikos
Zacharaki, Evangelia I.
author_sort Amidi, Shervine
collection PubMed
description The number of protein structures in the PDB database has been increasing more than 15-fold since 1999. The creation of computational models predicting enzymatic function is of major importance since such models provide the means to better understand the behavior of newly discovered enzymes when catalyzing chemical reactions. Until now, single-label classification has been widely performed for predicting enzymatic function limiting the application to enzymes performing unique reactions and introducing errors when multi-functional enzymes are examined. Indeed, some enzymes may be performing different reactions and can hence be directly associated with multiple enzymatic functions. In the present work, we propose a multi-label enzymatic function classification scheme that combines structural and amino acid sequence information. We investigate two fusion approaches (in the feature level and decision level) and assess the methodology for general enzymatic function prediction indicated by the first digit of the enzyme commission (EC) code (six main classes) on 40,034 enzymes from the PDB database. The proposed single-label and multi-label models predict correctly the actual functional activities in 97.8% and 95.5% (based on Hamming-loss) of the cases, respectively. Also the multi-label model predicts all possible enzymatic reactions in 85.4% of the multi-labeled enzymes when the number of reactions is unknown. Code and datasets are available at https://figshare.com/s/a63e0bafa9b71fc7cbd7.
format Online
Article
Text
id pubmed-5374972
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-53749722017-03-31 Automatic single- and multi-label enzymatic function prediction by machine learning Amidi, Shervine Amidi, Afshine Vlachakis, Dimitrios Paragios, Nikos Zacharaki, Evangelia I. PeerJ Bioinformatics The number of protein structures in the PDB database has been increasing more than 15-fold since 1999. The creation of computational models predicting enzymatic function is of major importance since such models provide the means to better understand the behavior of newly discovered enzymes when catalyzing chemical reactions. Until now, single-label classification has been widely performed for predicting enzymatic function limiting the application to enzymes performing unique reactions and introducing errors when multi-functional enzymes are examined. Indeed, some enzymes may be performing different reactions and can hence be directly associated with multiple enzymatic functions. In the present work, we propose a multi-label enzymatic function classification scheme that combines structural and amino acid sequence information. We investigate two fusion approaches (in the feature level and decision level) and assess the methodology for general enzymatic function prediction indicated by the first digit of the enzyme commission (EC) code (six main classes) on 40,034 enzymes from the PDB database. The proposed single-label and multi-label models predict correctly the actual functional activities in 97.8% and 95.5% (based on Hamming-loss) of the cases, respectively. Also the multi-label model predicts all possible enzymatic reactions in 85.4% of the multi-labeled enzymes when the number of reactions is unknown. Code and datasets are available at https://figshare.com/s/a63e0bafa9b71fc7cbd7. PeerJ Inc. 2017-03-29 /pmc/articles/PMC5374972/ /pubmed/28367366 http://dx.doi.org/10.7717/peerj.3095 Text en © 2017 Amidi et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Amidi, Shervine
Amidi, Afshine
Vlachakis, Dimitrios
Paragios, Nikos
Zacharaki, Evangelia I.
Automatic single- and multi-label enzymatic function prediction by machine learning
title Automatic single- and multi-label enzymatic function prediction by machine learning
title_full Automatic single- and multi-label enzymatic function prediction by machine learning
title_fullStr Automatic single- and multi-label enzymatic function prediction by machine learning
title_full_unstemmed Automatic single- and multi-label enzymatic function prediction by machine learning
title_short Automatic single- and multi-label enzymatic function prediction by machine learning
title_sort automatic single- and multi-label enzymatic function prediction by machine learning
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5374972/
https://www.ncbi.nlm.nih.gov/pubmed/28367366
http://dx.doi.org/10.7717/peerj.3095
work_keys_str_mv AT amidishervine automaticsingleandmultilabelenzymaticfunctionpredictionbymachinelearning
AT amidiafshine automaticsingleandmultilabelenzymaticfunctionpredictionbymachinelearning
AT vlachakisdimitrios automaticsingleandmultilabelenzymaticfunctionpredictionbymachinelearning
AT paragiosnikos automaticsingleandmultilabelenzymaticfunctionpredictionbymachinelearning
AT zacharakievangeliai automaticsingleandmultilabelenzymaticfunctionpredictionbymachinelearning