Cargando…

Protein Language Models Uncover Carbohydrate-Active Enzyme Function in Metagenomics

In metagenomics, the pool of uncharacterized microbial enzymes presents a challenge for functional annotation. Among these, carbohydrate-active enzymes (CAZymes) stand out due to their pivotal roles in various biological processes related to host health and nutrition. Here, we present CAZyLingua, th...

Descripción completa

Detalles Bibliográficos
Autores principales: Thurimella, Kumar, Mohamed, Ahmed M. T., Graham, Daniel B., Owens, Róisín M., La Rosa, Sabina Leanti, Plichta, Damian R., Bacallado, Sergio, Xavier, Ramnik J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10634757/
https://www.ncbi.nlm.nih.gov/pubmed/37961379
http://dx.doi.org/10.1101/2023.10.23.563620
_version_ 1785146236223881216
author Thurimella, Kumar
Mohamed, Ahmed M. T.
Graham, Daniel B.
Owens, Róisín M.
La Rosa, Sabina Leanti
Plichta, Damian R.
Bacallado, Sergio
Xavier, Ramnik J.
author_facet Thurimella, Kumar
Mohamed, Ahmed M. T.
Graham, Daniel B.
Owens, Róisín M.
La Rosa, Sabina Leanti
Plichta, Damian R.
Bacallado, Sergio
Xavier, Ramnik J.
author_sort Thurimella, Kumar
collection PubMed
description In metagenomics, the pool of uncharacterized microbial enzymes presents a challenge for functional annotation. Among these, carbohydrate-active enzymes (CAZymes) stand out due to their pivotal roles in various biological processes related to host health and nutrition. Here, we present CAZyLingua, the first tool that harnesses protein language model embeddings to build a deep learning framework that facilitates the annotation of CAZymes in metagenomic datasets. Our benchmarking results showed on average a higher F1 score (reflecting an average of precision and recall) on the annotated genomes of Bacteroides thetaiotaomicron, Eggerthella lenta and Ruminococcus gnavus compared to the traditional sequence homology-based method in dbCAN2. We applied our tool to a paired mother/infant longitudinal dataset and revealed unannotated CAZymes linked to microbial development during infancy. When applied to metagenomic datasets derived from patients affected by fibrosis-prone diseases such as Crohn’s disease and IgG4-related disease, CAZyLingua uncovered CAZymes associated with disease and healthy states. In each of these metagenomic catalogs, CAZyLingua discovered new annotations that were previously overlooked by traditional sequence homology tools. Overall, the deep learning model CAZyLingua can be applied in combination with existing tools to unravel intricate CAZyme evolutionary profiles and patterns, contributing to a more comprehensive understanding of microbial metabolic dynamics.
format Online
Article
Text
id pubmed-10634757
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-106347572023-11-13 Protein Language Models Uncover Carbohydrate-Active Enzyme Function in Metagenomics Thurimella, Kumar Mohamed, Ahmed M. T. Graham, Daniel B. Owens, Róisín M. La Rosa, Sabina Leanti Plichta, Damian R. Bacallado, Sergio Xavier, Ramnik J. bioRxiv Article In metagenomics, the pool of uncharacterized microbial enzymes presents a challenge for functional annotation. Among these, carbohydrate-active enzymes (CAZymes) stand out due to their pivotal roles in various biological processes related to host health and nutrition. Here, we present CAZyLingua, the first tool that harnesses protein language model embeddings to build a deep learning framework that facilitates the annotation of CAZymes in metagenomic datasets. Our benchmarking results showed on average a higher F1 score (reflecting an average of precision and recall) on the annotated genomes of Bacteroides thetaiotaomicron, Eggerthella lenta and Ruminococcus gnavus compared to the traditional sequence homology-based method in dbCAN2. We applied our tool to a paired mother/infant longitudinal dataset and revealed unannotated CAZymes linked to microbial development during infancy. When applied to metagenomic datasets derived from patients affected by fibrosis-prone diseases such as Crohn’s disease and IgG4-related disease, CAZyLingua uncovered CAZymes associated with disease and healthy states. In each of these metagenomic catalogs, CAZyLingua discovered new annotations that were previously overlooked by traditional sequence homology tools. Overall, the deep learning model CAZyLingua can be applied in combination with existing tools to unravel intricate CAZyme evolutionary profiles and patterns, contributing to a more comprehensive understanding of microbial metabolic dynamics. Cold Spring Harbor Laboratory 2023-10-25 /pmc/articles/PMC10634757/ /pubmed/37961379 http://dx.doi.org/10.1101/2023.10.23.563620 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nc-nd/4.0/) , which allows reusers to copy and distribute the material in any medium or format in unadapted form only, for noncommercial purposes only, and only so long as attribution is given to the creator.
spellingShingle Article
Thurimella, Kumar
Mohamed, Ahmed M. T.
Graham, Daniel B.
Owens, Róisín M.
La Rosa, Sabina Leanti
Plichta, Damian R.
Bacallado, Sergio
Xavier, Ramnik J.
Protein Language Models Uncover Carbohydrate-Active Enzyme Function in Metagenomics
title Protein Language Models Uncover Carbohydrate-Active Enzyme Function in Metagenomics
title_full Protein Language Models Uncover Carbohydrate-Active Enzyme Function in Metagenomics
title_fullStr Protein Language Models Uncover Carbohydrate-Active Enzyme Function in Metagenomics
title_full_unstemmed Protein Language Models Uncover Carbohydrate-Active Enzyme Function in Metagenomics
title_short Protein Language Models Uncover Carbohydrate-Active Enzyme Function in Metagenomics
title_sort protein language models uncover carbohydrate-active enzyme function in metagenomics
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10634757/
https://www.ncbi.nlm.nih.gov/pubmed/37961379
http://dx.doi.org/10.1101/2023.10.23.563620
work_keys_str_mv AT thurimellakumar proteinlanguagemodelsuncovercarbohydrateactiveenzymefunctioninmetagenomics
AT mohamedahmedmt proteinlanguagemodelsuncovercarbohydrateactiveenzymefunctioninmetagenomics
AT grahamdanielb proteinlanguagemodelsuncovercarbohydrateactiveenzymefunctioninmetagenomics
AT owensroisinm proteinlanguagemodelsuncovercarbohydrateactiveenzymefunctioninmetagenomics
AT larosasabinaleanti proteinlanguagemodelsuncovercarbohydrateactiveenzymefunctioninmetagenomics
AT plichtadamianr proteinlanguagemodelsuncovercarbohydrateactiveenzymefunctioninmetagenomics
AT bacalladosergio proteinlanguagemodelsuncovercarbohydrateactiveenzymefunctioninmetagenomics
AT xavierramnikj proteinlanguagemodelsuncovercarbohydrateactiveenzymefunctioninmetagenomics