Cargando…

Towards structured output prediction of enzyme function

BACKGROUND: In this paper we describe work in progress in developing kernel methods for enzyme function prediction. Our focus is in developing so called structured output prediction methods, where the enzymatic reaction is the combinatorial target object for prediction. We compared two structured ou...

Descripción completa

Detalles Bibliográficos
Autores principales: Astikainen, Katja, Holm, Liisa, Pitkänen, Esa, Szedmak, Sandor, Rousu, Juho
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2654971/
https://www.ncbi.nlm.nih.gov/pubmed/19091049
_version_ 1782165426755600384
author Astikainen, Katja
Holm, Liisa
Pitkänen, Esa
Szedmak, Sandor
Rousu, Juho
author_facet Astikainen, Katja
Holm, Liisa
Pitkänen, Esa
Szedmak, Sandor
Rousu, Juho
author_sort Astikainen, Katja
collection PubMed
description BACKGROUND: In this paper we describe work in progress in developing kernel methods for enzyme function prediction. Our focus is in developing so called structured output prediction methods, where the enzymatic reaction is the combinatorial target object for prediction. We compared two structured output prediction methods, the Hierarchical Max-Margin Markov algorithm (HM(3)) and the Maximum Margin Regression algorithm (MMR) in hierarchical classification of enzyme function. As sequence features we use various string kernels and the GTG feature set derived from the global alignment trace graph of protein sequences. RESULTS: In our experiments, in predicting enzyme EC classification we obtain over 85% accuracy (predicting the four digit EC code) and over 91% microlabel F1 score (predicting individual EC digits). In predicting the Gold Standard enzyme families, we obtain over 79% accuracy (predicting family correctly) and over 89% microlabel F1 score (predicting superfamilies and families). In the latter case, structured output methods are significantly more accurate than nearest neighbor classifier. A polynomial kernel over the GTG feature set turned out to be a prerequisite for accurate function prediction. Combining GTG with string kernels boosted accuracy slightly in the case of EC class prediction. CONCLUSION: Structured output prediction with GTG features is shown to be computationally feasible and to have accuracy on par with state-of-the-art approaches in enzyme function prediction.
format Text
id pubmed-2654971
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-26549712009-03-13 Towards structured output prediction of enzyme function Astikainen, Katja Holm, Liisa Pitkänen, Esa Szedmak, Sandor Rousu, Juho BMC Proc Proceedings BACKGROUND: In this paper we describe work in progress in developing kernel methods for enzyme function prediction. Our focus is in developing so called structured output prediction methods, where the enzymatic reaction is the combinatorial target object for prediction. We compared two structured output prediction methods, the Hierarchical Max-Margin Markov algorithm (HM(3)) and the Maximum Margin Regression algorithm (MMR) in hierarchical classification of enzyme function. As sequence features we use various string kernels and the GTG feature set derived from the global alignment trace graph of protein sequences. RESULTS: In our experiments, in predicting enzyme EC classification we obtain over 85% accuracy (predicting the four digit EC code) and over 91% microlabel F1 score (predicting individual EC digits). In predicting the Gold Standard enzyme families, we obtain over 79% accuracy (predicting family correctly) and over 89% microlabel F1 score (predicting superfamilies and families). In the latter case, structured output methods are significantly more accurate than nearest neighbor classifier. A polynomial kernel over the GTG feature set turned out to be a prerequisite for accurate function prediction. Combining GTG with string kernels boosted accuracy slightly in the case of EC class prediction. CONCLUSION: Structured output prediction with GTG features is shown to be computationally feasible and to have accuracy on par with state-of-the-art approaches in enzyme function prediction. BioMed Central 2008-12-17 /pmc/articles/PMC2654971/ /pubmed/19091049 Text en Copyright © 2008 Astikainen et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Proceedings
Astikainen, Katja
Holm, Liisa
Pitkänen, Esa
Szedmak, Sandor
Rousu, Juho
Towards structured output prediction of enzyme function
title Towards structured output prediction of enzyme function
title_full Towards structured output prediction of enzyme function
title_fullStr Towards structured output prediction of enzyme function
title_full_unstemmed Towards structured output prediction of enzyme function
title_short Towards structured output prediction of enzyme function
title_sort towards structured output prediction of enzyme function
topic Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2654971/
https://www.ncbi.nlm.nih.gov/pubmed/19091049
work_keys_str_mv AT astikainenkatja towardsstructuredoutputpredictionofenzymefunction
AT holmliisa towardsstructuredoutputpredictionofenzymefunction
AT pitkanenesa towardsstructuredoutputpredictionofenzymefunction
AT szedmaksandor towardsstructuredoutputpredictionofenzymefunction
AT rousujuho towardsstructuredoutputpredictionofenzymefunction