Cargando…

cazy_webscraper: local compilation and interrogation of comprehensive CAZyme datasets

Carbohydrate active enzymes (CAZymes) are pivotal in biological processes including energy metabolism, cell structure maintenance, signalling, and pathogen recognition. Bioinformatic prediction and mining of CAZymes improves our understanding of these activities and enables discovery of candidates o...

Descripción completa

Detalles Bibliográficos
Autores principales: Hobbs, Emma E. M., Gloster, Tracey M., Pritchard, Leighton
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Microbiology Society 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10483417/
https://www.ncbi.nlm.nih.gov/pubmed/37578822
http://dx.doi.org/10.1099/mgen.0.001086
_version_ 1785102382492811264
author Hobbs, Emma E. M.
Gloster, Tracey M.
Pritchard, Leighton
author_facet Hobbs, Emma E. M.
Gloster, Tracey M.
Pritchard, Leighton
author_sort Hobbs, Emma E. M.
collection PubMed
description Carbohydrate active enzymes (CAZymes) are pivotal in biological processes including energy metabolism, cell structure maintenance, signalling, and pathogen recognition. Bioinformatic prediction and mining of CAZymes improves our understanding of these activities and enables discovery of candidates of interest for industrial biotechnology, particularly the processing of organic waste for biofuel production. CAZy (www.cazy.org) is a high-quality, manually curated, and authoritative database of CAZymes that is often the starting point for these analyses. Automated querying and integration of CAZy data with other public datasets would constitute a powerful resource for mining and exploring CAZyme diversity. However, CAZy does not itself provide methods to automate queries, or integrate annotation data from other sources (except by following hyperlinks) to support further analysis. To overcome these limitations we developed cazy_webscraper, a command-line tool that retrieves data from CAZy and other online resources to build a local, shareable and reproducible database that augments and extends the authoritative CAZy database. cazy_webscraper’s integration of curated CAZyme annotations with their corresponding protein sequences, up-to-date taxonomy assignments, and protein structure data facilitates automated large-scale and targeted bioinformatic CAZyme family analysis and candidate screening. This tool has found widespread uptake in the community, with over 35 000 downloads (from April 2021 to June 2023). We demonstrate the use and application of cazy_webscraper to: (i) augment, update and correct CAZy database accessions; (ii) explore the taxonomic distribution of CAZymes recorded in CAZy, identifying under-represented taxa and unusual CAZy class distributions; and (iii) investigate three CAZymes having potential biotechnological application for degradation of biomass, but lacking a representative structure in the PDB database. We describe in general how cazy_webscraper facilitates functional, structural and evolutionary studies to aid identification of candidate enzymes for further characterization, and specifically note that CAZy provides supporting evidence for recent expansion of the Auxiliary Activities (AA) CAZy family in eukaryotes, consistent with functions potentially specific to eukaryotic lifestyles.
format Online
Article
Text
id pubmed-10483417
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Microbiology Society
record_format MEDLINE/PubMed
spelling pubmed-104834172023-09-08 cazy_webscraper: local compilation and interrogation of comprehensive CAZyme datasets Hobbs, Emma E. M. Gloster, Tracey M. Pritchard, Leighton Microb Genom Research Articles Carbohydrate active enzymes (CAZymes) are pivotal in biological processes including energy metabolism, cell structure maintenance, signalling, and pathogen recognition. Bioinformatic prediction and mining of CAZymes improves our understanding of these activities and enables discovery of candidates of interest for industrial biotechnology, particularly the processing of organic waste for biofuel production. CAZy (www.cazy.org) is a high-quality, manually curated, and authoritative database of CAZymes that is often the starting point for these analyses. Automated querying and integration of CAZy data with other public datasets would constitute a powerful resource for mining and exploring CAZyme diversity. However, CAZy does not itself provide methods to automate queries, or integrate annotation data from other sources (except by following hyperlinks) to support further analysis. To overcome these limitations we developed cazy_webscraper, a command-line tool that retrieves data from CAZy and other online resources to build a local, shareable and reproducible database that augments and extends the authoritative CAZy database. cazy_webscraper’s integration of curated CAZyme annotations with their corresponding protein sequences, up-to-date taxonomy assignments, and protein structure data facilitates automated large-scale and targeted bioinformatic CAZyme family analysis and candidate screening. This tool has found widespread uptake in the community, with over 35 000 downloads (from April 2021 to June 2023). We demonstrate the use and application of cazy_webscraper to: (i) augment, update and correct CAZy database accessions; (ii) explore the taxonomic distribution of CAZymes recorded in CAZy, identifying under-represented taxa and unusual CAZy class distributions; and (iii) investigate three CAZymes having potential biotechnological application for degradation of biomass, but lacking a representative structure in the PDB database. We describe in general how cazy_webscraper facilitates functional, structural and evolutionary studies to aid identification of candidate enzymes for further characterization, and specifically note that CAZy provides supporting evidence for recent expansion of the Auxiliary Activities (AA) CAZy family in eukaryotes, consistent with functions potentially specific to eukaryotic lifestyles. Microbiology Society 2023-08-14 /pmc/articles/PMC10483417/ /pubmed/37578822 http://dx.doi.org/10.1099/mgen.0.001086 Text en © 2023 The Authors https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License. This article was made open access via a Publish and Read agreement between the Microbiology Society and the corresponding author’s institution.
spellingShingle Research Articles
Hobbs, Emma E. M.
Gloster, Tracey M.
Pritchard, Leighton
cazy_webscraper: local compilation and interrogation of comprehensive CAZyme datasets
title cazy_webscraper: local compilation and interrogation of comprehensive CAZyme datasets
title_full cazy_webscraper: local compilation and interrogation of comprehensive CAZyme datasets
title_fullStr cazy_webscraper: local compilation and interrogation of comprehensive CAZyme datasets
title_full_unstemmed cazy_webscraper: local compilation and interrogation of comprehensive CAZyme datasets
title_short cazy_webscraper: local compilation and interrogation of comprehensive CAZyme datasets
title_sort cazy_webscraper: local compilation and interrogation of comprehensive cazyme datasets
topic Research Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10483417/
https://www.ncbi.nlm.nih.gov/pubmed/37578822
http://dx.doi.org/10.1099/mgen.0.001086
work_keys_str_mv AT hobbsemmaem cazywebscraperlocalcompilationandinterrogationofcomprehensivecazymedatasets
AT glostertraceym cazywebscraperlocalcompilationandinterrogationofcomprehensivecazymedatasets
AT pritchardleighton cazywebscraperlocalcompilationandinterrogationofcomprehensivecazymedatasets