Cargando…

Homology to peptide pattern for annotation of carbohydrate-active enzymes and prediction of function

BACKGROUND: Carbohydrate-active enzymes are found in all organisms and participate in key biological processes. These enzymes are classified in 274 families in the CAZy database but the sequence diversity within each family makes it a major task to identify new family members and to provide basis fo...

Descripción completa

Detalles Bibliográficos
Autores principales: Busk, P. K., Pilgaard, B., Lezyk, M. J., Meyer, A. S., Lange, L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5389127/
https://www.ncbi.nlm.nih.gov/pubmed/28403817
http://dx.doi.org/10.1186/s12859-017-1625-9
_version_ 1782521234235326464
author Busk, P. K.
Pilgaard, B.
Lezyk, M. J.
Meyer, A. S.
Lange, L.
author_facet Busk, P. K.
Pilgaard, B.
Lezyk, M. J.
Meyer, A. S.
Lange, L.
author_sort Busk, P. K.
collection PubMed
description BACKGROUND: Carbohydrate-active enzymes are found in all organisms and participate in key biological processes. These enzymes are classified in 274 families in the CAZy database but the sequence diversity within each family makes it a major task to identify new family members and to provide basis for prediction of enzyme function. A fast and reliable method for de novo annotation of genes encoding carbohydrate-active enzymes is to identify conserved peptides in the curated enzyme families followed by matching of the conserved peptides to the sequence of interest as demonstrated for the glycosyl hydrolase and the lytic polysaccharide monooxygenase families. This approach not only assigns the enzymes to families but also provides functional prediction of the enzymes with high accuracy. RESULTS: We identified conserved peptides for all enzyme families in the CAZy database with Peptide Pattern Recognition. The conserved peptides were matched to protein sequence for de novo annotation and functional prediction of carbohydrate-active enzymes with the Hotpep method. Annotation of protein sequences from 12 bacterial and 16 fungal genomes to families with Hotpep had an accuracy of 0.84 (measured as F1-score) compared to semiautomatic annotation by the CAZy database whereas the dbCAN HMM-based method had an accuracy of 0.77 with optimized parameters. Furthermore, Hotpep provided a functional prediction with 86% accuracy for the annotated genes. Hotpep is available as a stand-alone application for MS Windows. CONCLUSIONS: Hotpep is a state-of-the-art method for automatic annotation and functional prediction of carbohydrate-active enzymes. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1625-9) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5389127
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-53891272017-04-14 Homology to peptide pattern for annotation of carbohydrate-active enzymes and prediction of function Busk, P. K. Pilgaard, B. Lezyk, M. J. Meyer, A. S. Lange, L. BMC Bioinformatics Software BACKGROUND: Carbohydrate-active enzymes are found in all organisms and participate in key biological processes. These enzymes are classified in 274 families in the CAZy database but the sequence diversity within each family makes it a major task to identify new family members and to provide basis for prediction of enzyme function. A fast and reliable method for de novo annotation of genes encoding carbohydrate-active enzymes is to identify conserved peptides in the curated enzyme families followed by matching of the conserved peptides to the sequence of interest as demonstrated for the glycosyl hydrolase and the lytic polysaccharide monooxygenase families. This approach not only assigns the enzymes to families but also provides functional prediction of the enzymes with high accuracy. RESULTS: We identified conserved peptides for all enzyme families in the CAZy database with Peptide Pattern Recognition. The conserved peptides were matched to protein sequence for de novo annotation and functional prediction of carbohydrate-active enzymes with the Hotpep method. Annotation of protein sequences from 12 bacterial and 16 fungal genomes to families with Hotpep had an accuracy of 0.84 (measured as F1-score) compared to semiautomatic annotation by the CAZy database whereas the dbCAN HMM-based method had an accuracy of 0.77 with optimized parameters. Furthermore, Hotpep provided a functional prediction with 86% accuracy for the annotated genes. Hotpep is available as a stand-alone application for MS Windows. CONCLUSIONS: Hotpep is a state-of-the-art method for automatic annotation and functional prediction of carbohydrate-active enzymes. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-017-1625-9) contains supplementary material, which is available to authorized users. BioMed Central 2017-04-12 /pmc/articles/PMC5389127/ /pubmed/28403817 http://dx.doi.org/10.1186/s12859-017-1625-9 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software
Busk, P. K.
Pilgaard, B.
Lezyk, M. J.
Meyer, A. S.
Lange, L.
Homology to peptide pattern for annotation of carbohydrate-active enzymes and prediction of function
title Homology to peptide pattern for annotation of carbohydrate-active enzymes and prediction of function
title_full Homology to peptide pattern for annotation of carbohydrate-active enzymes and prediction of function
title_fullStr Homology to peptide pattern for annotation of carbohydrate-active enzymes and prediction of function
title_full_unstemmed Homology to peptide pattern for annotation of carbohydrate-active enzymes and prediction of function
title_short Homology to peptide pattern for annotation of carbohydrate-active enzymes and prediction of function
title_sort homology to peptide pattern for annotation of carbohydrate-active enzymes and prediction of function
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5389127/
https://www.ncbi.nlm.nih.gov/pubmed/28403817
http://dx.doi.org/10.1186/s12859-017-1625-9
work_keys_str_mv AT buskpk homologytopeptidepatternforannotationofcarbohydrateactiveenzymesandpredictionoffunction
AT pilgaardb homologytopeptidepatternforannotationofcarbohydrateactiveenzymesandpredictionoffunction
AT lezykmj homologytopeptidepatternforannotationofcarbohydrateactiveenzymesandpredictionoffunction
AT meyeras homologytopeptidepatternforannotationofcarbohydrateactiveenzymesandpredictionoffunction
AT langel homologytopeptidepatternforannotationofcarbohydrateactiveenzymesandpredictionoffunction