Cargando…
Performance of Hidden Markov Models in Recovering the Standard Classification of Glycoside Hydrolases
Glycoside hydrolases (GHs) are carbohydrate-active enzymes that assist the hydrolysis of glycoside bonds of complex sugars into carbohydrates. The current standard GH family classification is available in the CAZy database, which is based on the similarities of amino acid sequences and curated semi-...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
SAGE Publications
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5404901/ https://www.ncbi.nlm.nih.gov/pubmed/28469382 http://dx.doi.org/10.1177/1176934317703401 |
_version_ | 1783231671362912256 |
---|---|
author | Rossi, Mariana Fonseca Mello, Beatriz Schrago, Carlos G |
author_facet | Rossi, Mariana Fonseca Mello, Beatriz Schrago, Carlos G |
author_sort | Rossi, Mariana Fonseca |
collection | PubMed |
description | Glycoside hydrolases (GHs) are carbohydrate-active enzymes that assist the hydrolysis of glycoside bonds of complex sugars into carbohydrates. The current standard GH family classification is available in the CAZy database, which is based on the similarities of amino acid sequences and curated semi-automatically. However, with the exponential increase in data availability from genome sequences, automated classification methods are required for the fast annotation of coding sequences. Currently, the dbCAN database offers automatic annotations of signature domains from CAZy-defined classifications using a statistical approach, the hidden Markov models (HMMs). However, dbCAN does not contain the entire set of CAZy GH families. Moreover, no evaluation has been conducted so far of the viability of using HMM profiles as a means of automatically assigning GH amino acid sequences to the standard CAZy GH family classification itself. In this work, we performed a meta-analysis in which amino acid sequences from CAZy-defined GH families were used to build HMM family-specific profiles. We then queried a set with ~300 000 GH sequences against our database of HMM profiles estimated from CAZy families. We conducted the same evaluation against the available dbCAN HMM profiles. Our analyses recovered 65% of matches with the standard CAZy classification, whereas dbCAN HMMs resulted in 61% of matches. We also provided an analysis of the types of errors commonly found when HMMs are used to recover CAZy-based classifications. Although the performance of HMM was good, further developments are necessary for a fully automated classification of GH, allowing the standardization of GH classification among protein databases. |
format | Online Article Text |
id | pubmed-5404901 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | SAGE Publications |
record_format | MEDLINE/PubMed |
spelling | pubmed-54049012017-05-03 Performance of Hidden Markov Models in Recovering the Standard Classification of Glycoside Hydrolases Rossi, Mariana Fonseca Mello, Beatriz Schrago, Carlos G Evol Bioinform Online Original Research Glycoside hydrolases (GHs) are carbohydrate-active enzymes that assist the hydrolysis of glycoside bonds of complex sugars into carbohydrates. The current standard GH family classification is available in the CAZy database, which is based on the similarities of amino acid sequences and curated semi-automatically. However, with the exponential increase in data availability from genome sequences, automated classification methods are required for the fast annotation of coding sequences. Currently, the dbCAN database offers automatic annotations of signature domains from CAZy-defined classifications using a statistical approach, the hidden Markov models (HMMs). However, dbCAN does not contain the entire set of CAZy GH families. Moreover, no evaluation has been conducted so far of the viability of using HMM profiles as a means of automatically assigning GH amino acid sequences to the standard CAZy GH family classification itself. In this work, we performed a meta-analysis in which amino acid sequences from CAZy-defined GH families were used to build HMM family-specific profiles. We then queried a set with ~300 000 GH sequences against our database of HMM profiles estimated from CAZy families. We conducted the same evaluation against the available dbCAN HMM profiles. Our analyses recovered 65% of matches with the standard CAZy classification, whereas dbCAN HMMs resulted in 61% of matches. We also provided an analysis of the types of errors commonly found when HMMs are used to recover CAZy-based classifications. Although the performance of HMM was good, further developments are necessary for a fully automated classification of GH, allowing the standardization of GH classification among protein databases. SAGE Publications 2017-04-20 /pmc/articles/PMC5404901/ /pubmed/28469382 http://dx.doi.org/10.1177/1176934317703401 Text en © The Author(s) 2017 http://creativecommons.org/licenses/by-nc/4.0/ This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 License (http://www.creativecommons.org/licenses/by-nc/4.0/) which permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access page(https://us.sagepub.com/en-us/nam/open-access-at-sage). |
spellingShingle | Original Research Rossi, Mariana Fonseca Mello, Beatriz Schrago, Carlos G Performance of Hidden Markov Models in Recovering the Standard Classification of Glycoside Hydrolases |
title | Performance of Hidden Markov Models in Recovering the Standard Classification of Glycoside Hydrolases |
title_full | Performance of Hidden Markov Models in Recovering the Standard Classification of Glycoside Hydrolases |
title_fullStr | Performance of Hidden Markov Models in Recovering the Standard Classification of Glycoside Hydrolases |
title_full_unstemmed | Performance of Hidden Markov Models in Recovering the Standard Classification of Glycoside Hydrolases |
title_short | Performance of Hidden Markov Models in Recovering the Standard Classification of Glycoside Hydrolases |
title_sort | performance of hidden markov models in recovering the standard classification of glycoside hydrolases |
topic | Original Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5404901/ https://www.ncbi.nlm.nih.gov/pubmed/28469382 http://dx.doi.org/10.1177/1176934317703401 |
work_keys_str_mv | AT rossimarianafonseca performanceofhiddenmarkovmodelsinrecoveringthestandardclassificationofglycosidehydrolases AT mellobeatriz performanceofhiddenmarkovmodelsinrecoveringthestandardclassificationofglycosidehydrolases AT schragocarlosg performanceofhiddenmarkovmodelsinrecoveringthestandardclassificationofglycosidehydrolases |