Cargando…

Enzyme mechanism prediction: a template matching problem on InterPro signature subspaces

BACKGROUND: We recently reported that one may be able to predict with high accuracy the chemical mechanism of an enzyme by employing a simple pattern recognition approach: a k Nearest Neighbour rule with k = 1 (k(1)NN) and 321 InterPro sequence signatures as enzyme features. The nearest-neighbour ru...

Descripción completa

Detalles Bibliográficos
Autores principales: Mussa, Hamse Y., De Ferrari, Luna, Mitchell, John B. O.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4669639/
https://www.ncbi.nlm.nih.gov/pubmed/26634450
http://dx.doi.org/10.1186/s13104-015-1730-7
_version_ 1782404133869846528
author Mussa, Hamse Y.
De Ferrari, Luna
Mitchell, John B. O.
author_facet Mussa, Hamse Y.
De Ferrari, Luna
Mitchell, John B. O.
author_sort Mussa, Hamse Y.
collection PubMed
description BACKGROUND: We recently reported that one may be able to predict with high accuracy the chemical mechanism of an enzyme by employing a simple pattern recognition approach: a k Nearest Neighbour rule with k = 1 (k(1)NN) and 321 InterPro sequence signatures as enzyme features. The nearest-neighbour rule is known to be highly sensitive to errors in the training data, in particular when the available training dataset is small. This was the case in our previous study, in which our dataset comprised 248 enzymes annotated against 71 enzymatic mechanism labels from the MACiE database. In the current study, we have carefully re-analysed our dataset and prediction results to “explain” why a high variance k(1)NN rule exhibited such remarkable classification performance. RESULTS: We find that enzymes with different chemical mechanism labels in this dataset reside in barely overlapping subspaces in the feature space defined by the 321 features selected. These features contain the appropriate information needed to accurately classify the enzymatic mechanisms, rendering our classification problem a basic look-up exercise. This observation dovetails with the low misclassification rate we reported. CONCLUSION: Our results provide explanations for the “anomaly”—a basic nearest-neighbour algorithm exhibiting remarkable prediction performance for enzymatic mechanism despite the fact that the feature space was large and sparse. Our results also dovetail well with another finding we reported, namely that InterPro signatures are critical for accurate prediction of enzyme mechanism. We also suggest simple rules that might enable one to inductively predict whether a novel enzyme possesses any of our 71 predefined mechanisms.
format Online
Article
Text
id pubmed-4669639
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-46696392015-12-05 Enzyme mechanism prediction: a template matching problem on InterPro signature subspaces Mussa, Hamse Y. De Ferrari, Luna Mitchell, John B. O. BMC Res Notes Short Report BACKGROUND: We recently reported that one may be able to predict with high accuracy the chemical mechanism of an enzyme by employing a simple pattern recognition approach: a k Nearest Neighbour rule with k = 1 (k(1)NN) and 321 InterPro sequence signatures as enzyme features. The nearest-neighbour rule is known to be highly sensitive to errors in the training data, in particular when the available training dataset is small. This was the case in our previous study, in which our dataset comprised 248 enzymes annotated against 71 enzymatic mechanism labels from the MACiE database. In the current study, we have carefully re-analysed our dataset and prediction results to “explain” why a high variance k(1)NN rule exhibited such remarkable classification performance. RESULTS: We find that enzymes with different chemical mechanism labels in this dataset reside in barely overlapping subspaces in the feature space defined by the 321 features selected. These features contain the appropriate information needed to accurately classify the enzymatic mechanisms, rendering our classification problem a basic look-up exercise. This observation dovetails with the low misclassification rate we reported. CONCLUSION: Our results provide explanations for the “anomaly”—a basic nearest-neighbour algorithm exhibiting remarkable prediction performance for enzymatic mechanism despite the fact that the feature space was large and sparse. Our results also dovetail well with another finding we reported, namely that InterPro signatures are critical for accurate prediction of enzyme mechanism. We also suggest simple rules that might enable one to inductively predict whether a novel enzyme possesses any of our 71 predefined mechanisms. BioMed Central 2015-12-03 /pmc/articles/PMC4669639/ /pubmed/26634450 http://dx.doi.org/10.1186/s13104-015-1730-7 Text en © Mussa et al. 2015 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Short Report
Mussa, Hamse Y.
De Ferrari, Luna
Mitchell, John B. O.
Enzyme mechanism prediction: a template matching problem on InterPro signature subspaces
title Enzyme mechanism prediction: a template matching problem on InterPro signature subspaces
title_full Enzyme mechanism prediction: a template matching problem on InterPro signature subspaces
title_fullStr Enzyme mechanism prediction: a template matching problem on InterPro signature subspaces
title_full_unstemmed Enzyme mechanism prediction: a template matching problem on InterPro signature subspaces
title_short Enzyme mechanism prediction: a template matching problem on InterPro signature subspaces
title_sort enzyme mechanism prediction: a template matching problem on interpro signature subspaces
topic Short Report
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4669639/
https://www.ncbi.nlm.nih.gov/pubmed/26634450
http://dx.doi.org/10.1186/s13104-015-1730-7
work_keys_str_mv AT mussahamsey enzymemechanismpredictionatemplatematchingproblemoninterprosignaturesubspaces
AT deferrariluna enzymemechanismpredictionatemplatematchingproblemoninterprosignaturesubspaces
AT mitchelljohnbo enzymemechanismpredictionatemplatematchingproblemoninterprosignaturesubspaces