Cargando…

Reliable and Scalable Identification and Prioritization of Putative Cellulolytic Anaerobes With Large Genome Data

In the era of high-throughput sequencing, genetic information that is inherently whispering hints of the microbes’ functional niches is becoming easily accessible; however, properly identifying and characterizing these genetic hints to infer the microbes’ functional niches remains a challenge. Regar...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Yubo, Li, Liguan, Xia, Yu, Zhang, Tong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9580877/
https://www.ncbi.nlm.nih.gov/pubmed/36304268
http://dx.doi.org/10.3389/fbinf.2022.813771
_version_ 1784812491030659072
author Wang, Yubo
Li, Liguan
Xia, Yu
Zhang, Tong
author_facet Wang, Yubo
Li, Liguan
Xia, Yu
Zhang, Tong
author_sort Wang, Yubo
collection PubMed
description In the era of high-throughput sequencing, genetic information that is inherently whispering hints of the microbes’ functional niches is becoming easily accessible; however, properly identifying and characterizing these genetic hints to infer the microbes’ functional niches remains a challenge. Regarding genome-centric interpretation on the specific functional niche of cellulose hydrolysis for anaerobes, often encountered in practice is a lack of confidence in predicting the anaerobes’ real cellulolytic competency based solely on abundances of the varying carbohydrate-active enzyme modules annotated or on their taxonomy affiliation. Recognition of the synergy machineries that include but not limited to the cellulosome gene clusters is equally important as the annotation of individual carbohydrate-active modules or genes. In the interpretation of complete genomes of 2,768 microbe strains whose phenotypes have been well documented, with the incorporation of an automatic recognition of synergy among the carbohydrate active elements annotated, an explicit genotype–phenotype correlation was evidenced to be feasible for cellulolytic anaerobes, and a bioinformatic pipeline was developed accordingly. This genome-centric pipeline would categorize putative cellulolytic anaerobes into six genotype groups based on differential cellulose-hydrolyzing capacity and varying synergy mechanisms. Suggested in this genotype–phenotype correlation analysis was a finer categorization of the cellulosome gene clusters: although cellulosome complexes, by their nature, could enable the assembly of a number of carbohydrate-active units, they do not certainly guarantee the formation of the cellulose–enzyme–microbe complex or the cellulose-hydrolyzing activity of the corresponding anaerobe strains, for example, the well-known Clostridium acetobutylicum strains. Also, recognized in this genotype-phenotype correlation analysis was the genetic foundation of a previously unrecognized machinery that may mediate the microbe–cellulose adhesion, to be specific, enzymes encoded by genes harboring both the surface layer homology and cellulose-binding CBM modules. Applicability of this pipeline on scalable annotation of large genome datasets was further tested with the annotation of 7,902 reference genomes downloaded from NCBI, from which 14 genomes of putative paradigm cellulose-hydrolyzing anaerobes were identified. We believe the pipeline developed in this study would be a good add as a bioinformatic tool for genome-centric interpretation of uncultivated anaerobes, specifically on their functional niche of cellulose hydrolysis.
format Online
Article
Text
id pubmed-9580877
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-95808772022-10-26 Reliable and Scalable Identification and Prioritization of Putative Cellulolytic Anaerobes With Large Genome Data Wang, Yubo Li, Liguan Xia, Yu Zhang, Tong Front Bioinform Bioinformatics In the era of high-throughput sequencing, genetic information that is inherently whispering hints of the microbes’ functional niches is becoming easily accessible; however, properly identifying and characterizing these genetic hints to infer the microbes’ functional niches remains a challenge. Regarding genome-centric interpretation on the specific functional niche of cellulose hydrolysis for anaerobes, often encountered in practice is a lack of confidence in predicting the anaerobes’ real cellulolytic competency based solely on abundances of the varying carbohydrate-active enzyme modules annotated or on their taxonomy affiliation. Recognition of the synergy machineries that include but not limited to the cellulosome gene clusters is equally important as the annotation of individual carbohydrate-active modules or genes. In the interpretation of complete genomes of 2,768 microbe strains whose phenotypes have been well documented, with the incorporation of an automatic recognition of synergy among the carbohydrate active elements annotated, an explicit genotype–phenotype correlation was evidenced to be feasible for cellulolytic anaerobes, and a bioinformatic pipeline was developed accordingly. This genome-centric pipeline would categorize putative cellulolytic anaerobes into six genotype groups based on differential cellulose-hydrolyzing capacity and varying synergy mechanisms. Suggested in this genotype–phenotype correlation analysis was a finer categorization of the cellulosome gene clusters: although cellulosome complexes, by their nature, could enable the assembly of a number of carbohydrate-active units, they do not certainly guarantee the formation of the cellulose–enzyme–microbe complex or the cellulose-hydrolyzing activity of the corresponding anaerobe strains, for example, the well-known Clostridium acetobutylicum strains. Also, recognized in this genotype-phenotype correlation analysis was the genetic foundation of a previously unrecognized machinery that may mediate the microbe–cellulose adhesion, to be specific, enzymes encoded by genes harboring both the surface layer homology and cellulose-binding CBM modules. Applicability of this pipeline on scalable annotation of large genome datasets was further tested with the annotation of 7,902 reference genomes downloaded from NCBI, from which 14 genomes of putative paradigm cellulose-hydrolyzing anaerobes were identified. We believe the pipeline developed in this study would be a good add as a bioinformatic tool for genome-centric interpretation of uncultivated anaerobes, specifically on their functional niche of cellulose hydrolysis. Frontiers Media S.A. 2022-03-17 /pmc/articles/PMC9580877/ /pubmed/36304268 http://dx.doi.org/10.3389/fbinf.2022.813771 Text en Copyright © 2022 Wang, Li, Xia and Zhang. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Bioinformatics
Wang, Yubo
Li, Liguan
Xia, Yu
Zhang, Tong
Reliable and Scalable Identification and Prioritization of Putative Cellulolytic Anaerobes With Large Genome Data
title Reliable and Scalable Identification and Prioritization of Putative Cellulolytic Anaerobes With Large Genome Data
title_full Reliable and Scalable Identification and Prioritization of Putative Cellulolytic Anaerobes With Large Genome Data
title_fullStr Reliable and Scalable Identification and Prioritization of Putative Cellulolytic Anaerobes With Large Genome Data
title_full_unstemmed Reliable and Scalable Identification and Prioritization of Putative Cellulolytic Anaerobes With Large Genome Data
title_short Reliable and Scalable Identification and Prioritization of Putative Cellulolytic Anaerobes With Large Genome Data
title_sort reliable and scalable identification and prioritization of putative cellulolytic anaerobes with large genome data
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9580877/
https://www.ncbi.nlm.nih.gov/pubmed/36304268
http://dx.doi.org/10.3389/fbinf.2022.813771
work_keys_str_mv AT wangyubo reliableandscalableidentificationandprioritizationofputativecellulolyticanaerobeswithlargegenomedata
AT liliguan reliableandscalableidentificationandprioritizationofputativecellulolyticanaerobeswithlargegenomedata
AT xiayu reliableandscalableidentificationandprioritizationofputativecellulolyticanaerobeswithlargegenomedata
AT zhangtong reliableandscalableidentificationandprioritizationofputativecellulolyticanaerobeswithlargegenomedata