Cargando…

From sequence to enzyme mechanism using multi-label machine learning

BACKGROUND: In this work we predict enzyme function at the level of chemical mechanism, providing a finer granularity of annotation than traditional Enzyme Commission (EC) classes. Hence we can predict not only whether a putative enzyme in a newly sequenced organism has the potential to perform a ce...

Descripción completa

Detalles Bibliográficos
Autores principales: De Ferrari, Luna, Mitchell, John BO
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4229970/
https://www.ncbi.nlm.nih.gov/pubmed/24885296
http://dx.doi.org/10.1186/1471-2105-15-150
_version_ 1782344201221963776
author De Ferrari, Luna
Mitchell, John BO
author_facet De Ferrari, Luna
Mitchell, John BO
author_sort De Ferrari, Luna
collection PubMed
description BACKGROUND: In this work we predict enzyme function at the level of chemical mechanism, providing a finer granularity of annotation than traditional Enzyme Commission (EC) classes. Hence we can predict not only whether a putative enzyme in a newly sequenced organism has the potential to perform a certain reaction, but how the reaction is performed, using which cofactors and with susceptibility to which drugs or inhibitors, details with important consequences for drug and enzyme design. Work that predicts enzyme catalytic activity based on 3D protein structure features limits the prediction of mechanism to proteins already having either a solved structure or a close relative suitable for homology modelling. RESULTS: In this study, we evaluate whether sequence identity, InterPro or Catalytic Site Atlas sequence signatures provide enough information for bulk prediction of enzyme mechanism. By splitting MACiE (Mechanism, Annotation and Classification in Enzymes database) mechanism labels to a finer granularity, which includes the role of the protein chain in the overall enzyme complex, the method can predict at 96% accuracy (and 96% micro-averaged precision, 99.9% macro-averaged recall) the MACiE mechanism definitions of 248 proteins available in the MACiE, EzCatDb (Database of Enzyme Catalytic Mechanisms) and SFLD (Structure Function Linkage Database) databases using an off-the-shelf K-Nearest Neighbours multi-label algorithm. CONCLUSION: We find that InterPro signatures are critical for accurate prediction of enzyme mechanism. We also find that incorporating Catalytic Site Atlas attributes does not seem to provide additional accuracy. The software code (ml2db), data and results are available online at http://sourceforge.net/projects/ml2db/ and as supplementary files.
format Online
Article
Text
id pubmed-4229970
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-42299702014-11-14 From sequence to enzyme mechanism using multi-label machine learning De Ferrari, Luna Mitchell, John BO BMC Bioinformatics Research Article BACKGROUND: In this work we predict enzyme function at the level of chemical mechanism, providing a finer granularity of annotation than traditional Enzyme Commission (EC) classes. Hence we can predict not only whether a putative enzyme in a newly sequenced organism has the potential to perform a certain reaction, but how the reaction is performed, using which cofactors and with susceptibility to which drugs or inhibitors, details with important consequences for drug and enzyme design. Work that predicts enzyme catalytic activity based on 3D protein structure features limits the prediction of mechanism to proteins already having either a solved structure or a close relative suitable for homology modelling. RESULTS: In this study, we evaluate whether sequence identity, InterPro or Catalytic Site Atlas sequence signatures provide enough information for bulk prediction of enzyme mechanism. By splitting MACiE (Mechanism, Annotation and Classification in Enzymes database) mechanism labels to a finer granularity, which includes the role of the protein chain in the overall enzyme complex, the method can predict at 96% accuracy (and 96% micro-averaged precision, 99.9% macro-averaged recall) the MACiE mechanism definitions of 248 proteins available in the MACiE, EzCatDb (Database of Enzyme Catalytic Mechanisms) and SFLD (Structure Function Linkage Database) databases using an off-the-shelf K-Nearest Neighbours multi-label algorithm. CONCLUSION: We find that InterPro signatures are critical for accurate prediction of enzyme mechanism. We also find that incorporating Catalytic Site Atlas attributes does not seem to provide additional accuracy. The software code (ml2db), data and results are available online at http://sourceforge.net/projects/ml2db/ and as supplementary files. BioMed Central 2014-05-19 /pmc/articles/PMC4229970/ /pubmed/24885296 http://dx.doi.org/10.1186/1471-2105-15-150 Text en Copyright © 2014 De Ferrari and Mitchell; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
De Ferrari, Luna
Mitchell, John BO
From sequence to enzyme mechanism using multi-label machine learning
title From sequence to enzyme mechanism using multi-label machine learning
title_full From sequence to enzyme mechanism using multi-label machine learning
title_fullStr From sequence to enzyme mechanism using multi-label machine learning
title_full_unstemmed From sequence to enzyme mechanism using multi-label machine learning
title_short From sequence to enzyme mechanism using multi-label machine learning
title_sort from sequence to enzyme mechanism using multi-label machine learning
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4229970/
https://www.ncbi.nlm.nih.gov/pubmed/24885296
http://dx.doi.org/10.1186/1471-2105-15-150
work_keys_str_mv AT deferrariluna fromsequencetoenzymemechanismusingmultilabelmachinelearning
AT mitchelljohnbo fromsequencetoenzymemechanismusingmultilabelmachinelearning