Cargando…

Peptide-based functional annotation of carbohydrate-active enzymes by conserved unique peptide patterns (CUPP)

BACKGROUND: Insight into the function of carbohydrate-active enzymes is required to understand their biological role and industrial potential. There is a need for better use of the ample genomic data in order to enable selection of the most interesting proteins for further studies. The basis for ela...

Descripción completa

Detalles Bibliográficos
Autores principales: Barrett, Kristian, Lange, Lene
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6489277/
https://www.ncbi.nlm.nih.gov/pubmed/31168320
http://dx.doi.org/10.1186/s13068-019-1436-5
_version_ 1783414791039090688
author Barrett, Kristian
Lange, Lene
author_facet Barrett, Kristian
Lange, Lene
author_sort Barrett, Kristian
collection PubMed
description BACKGROUND: Insight into the function of carbohydrate-active enzymes is required to understand their biological role and industrial potential. There is a need for better use of the ample genomic data in order to enable selection of the most interesting proteins for further studies. The basis for elaborating a new approach to sequence analysis is the hypothesis that when using conserved peptide patterns to determine the similarities between proteins, the exact spacing between conserved adjacent amino acids in the proteins plays a prominent functional role. Thus, the objective of developing the method of conserved unique peptide patterns (CUPP) is to construct a peptide-based grouping and validate the method to provide evidence that CUPP captures function-related features of the individual carbohydrate-active enzymes (as defined by CAZy families). This approach facilitates grouping of enzymes at a level lower than protein families and/or subfamilies. A standardized, efficient, and robust approach to functional annotation of carbohydrate-active enzymes would support improved molecular insight into enzyme–substrate interaction. RESULTS: A new nonalignment-based clustering and functional annotation tool was developed that uses conserved unique peptides patterns to perform automated clustering of proteins and formation of protein groups. A peptide-based model was constructed for each of these protein CUPP groups to be used to automatically annotate protein family, subfamily, and EC function of carbohydrate-active enzymes. CUPP prediction can annotate proteins (from any CAZy family) with high F-score to existing family (0.966), subfamily (0.961), and EC-function (0.843). The speed of the CUPP program was estimated and exemplified by prediction of the 504,017 nonredundant proteins of CAZy in less than four CPU hours. CONCLUSION: It was possible to construct an automated system for clustering proteins within families and use the resulting CUPP groups to directly build peptide-based models for genome annotation. The CUPP runtime, F-score, sensitivity, and precisions of family and subfamily annotations match or represent an improvement compared to state-of-the-art tools. The speed of the CUPP annotation is similar to the rapid DIAMOND annotation tool. CUPP facilitates automated annotation of full genome assemblies to any CAZy family. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13068-019-1436-5) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6489277
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-64892772019-06-05 Peptide-based functional annotation of carbohydrate-active enzymes by conserved unique peptide patterns (CUPP) Barrett, Kristian Lange, Lene Biotechnol Biofuels Methodology BACKGROUND: Insight into the function of carbohydrate-active enzymes is required to understand their biological role and industrial potential. There is a need for better use of the ample genomic data in order to enable selection of the most interesting proteins for further studies. The basis for elaborating a new approach to sequence analysis is the hypothesis that when using conserved peptide patterns to determine the similarities between proteins, the exact spacing between conserved adjacent amino acids in the proteins plays a prominent functional role. Thus, the objective of developing the method of conserved unique peptide patterns (CUPP) is to construct a peptide-based grouping and validate the method to provide evidence that CUPP captures function-related features of the individual carbohydrate-active enzymes (as defined by CAZy families). This approach facilitates grouping of enzymes at a level lower than protein families and/or subfamilies. A standardized, efficient, and robust approach to functional annotation of carbohydrate-active enzymes would support improved molecular insight into enzyme–substrate interaction. RESULTS: A new nonalignment-based clustering and functional annotation tool was developed that uses conserved unique peptides patterns to perform automated clustering of proteins and formation of protein groups. A peptide-based model was constructed for each of these protein CUPP groups to be used to automatically annotate protein family, subfamily, and EC function of carbohydrate-active enzymes. CUPP prediction can annotate proteins (from any CAZy family) with high F-score to existing family (0.966), subfamily (0.961), and EC-function (0.843). The speed of the CUPP program was estimated and exemplified by prediction of the 504,017 nonredundant proteins of CAZy in less than four CPU hours. CONCLUSION: It was possible to construct an automated system for clustering proteins within families and use the resulting CUPP groups to directly build peptide-based models for genome annotation. The CUPP runtime, F-score, sensitivity, and precisions of family and subfamily annotations match or represent an improvement compared to state-of-the-art tools. The speed of the CUPP annotation is similar to the rapid DIAMOND annotation tool. CUPP facilitates automated annotation of full genome assemblies to any CAZy family. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13068-019-1436-5) contains supplementary material, which is available to authorized users. BioMed Central 2019-04-30 /pmc/articles/PMC6489277/ /pubmed/31168320 http://dx.doi.org/10.1186/s13068-019-1436-5 Text en © The Author(s) 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology
Barrett, Kristian
Lange, Lene
Peptide-based functional annotation of carbohydrate-active enzymes by conserved unique peptide patterns (CUPP)
title Peptide-based functional annotation of carbohydrate-active enzymes by conserved unique peptide patterns (CUPP)
title_full Peptide-based functional annotation of carbohydrate-active enzymes by conserved unique peptide patterns (CUPP)
title_fullStr Peptide-based functional annotation of carbohydrate-active enzymes by conserved unique peptide patterns (CUPP)
title_full_unstemmed Peptide-based functional annotation of carbohydrate-active enzymes by conserved unique peptide patterns (CUPP)
title_short Peptide-based functional annotation of carbohydrate-active enzymes by conserved unique peptide patterns (CUPP)
title_sort peptide-based functional annotation of carbohydrate-active enzymes by conserved unique peptide patterns (cupp)
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6489277/
https://www.ncbi.nlm.nih.gov/pubmed/31168320
http://dx.doi.org/10.1186/s13068-019-1436-5
work_keys_str_mv AT barrettkristian peptidebasedfunctionalannotationofcarbohydrateactiveenzymesbyconserveduniquepeptidepatternscupp
AT langelene peptidebasedfunctionalannotationofcarbohydrateactiveenzymesbyconserveduniquepeptidepatternscupp