Cargando…
Protein Language Models Uncover Carbohydrate-Active Enzyme Function in Metagenomics
In metagenomics, the pool of uncharacterized microbial enzymes presents a challenge for functional annotation. Among these, carbohydrate-active enzymes (CAZymes) stand out due to their pivotal roles in various biological processes related to host health and nutrition. Here, we present CAZyLingua, th...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cold Spring Harbor Laboratory
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10634757/ https://www.ncbi.nlm.nih.gov/pubmed/37961379 http://dx.doi.org/10.1101/2023.10.23.563620 |
_version_ | 1785146236223881216 |
---|---|
author | Thurimella, Kumar Mohamed, Ahmed M. T. Graham, Daniel B. Owens, Róisín M. La Rosa, Sabina Leanti Plichta, Damian R. Bacallado, Sergio Xavier, Ramnik J. |
author_facet | Thurimella, Kumar Mohamed, Ahmed M. T. Graham, Daniel B. Owens, Róisín M. La Rosa, Sabina Leanti Plichta, Damian R. Bacallado, Sergio Xavier, Ramnik J. |
author_sort | Thurimella, Kumar |
collection | PubMed |
description | In metagenomics, the pool of uncharacterized microbial enzymes presents a challenge for functional annotation. Among these, carbohydrate-active enzymes (CAZymes) stand out due to their pivotal roles in various biological processes related to host health and nutrition. Here, we present CAZyLingua, the first tool that harnesses protein language model embeddings to build a deep learning framework that facilitates the annotation of CAZymes in metagenomic datasets. Our benchmarking results showed on average a higher F1 score (reflecting an average of precision and recall) on the annotated genomes of Bacteroides thetaiotaomicron, Eggerthella lenta and Ruminococcus gnavus compared to the traditional sequence homology-based method in dbCAN2. We applied our tool to a paired mother/infant longitudinal dataset and revealed unannotated CAZymes linked to microbial development during infancy. When applied to metagenomic datasets derived from patients affected by fibrosis-prone diseases such as Crohn’s disease and IgG4-related disease, CAZyLingua uncovered CAZymes associated with disease and healthy states. In each of these metagenomic catalogs, CAZyLingua discovered new annotations that were previously overlooked by traditional sequence homology tools. Overall, the deep learning model CAZyLingua can be applied in combination with existing tools to unravel intricate CAZyme evolutionary profiles and patterns, contributing to a more comprehensive understanding of microbial metabolic dynamics. |
format | Online Article Text |
id | pubmed-10634757 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Cold Spring Harbor Laboratory |
record_format | MEDLINE/PubMed |
spelling | pubmed-106347572023-11-13 Protein Language Models Uncover Carbohydrate-Active Enzyme Function in Metagenomics Thurimella, Kumar Mohamed, Ahmed M. T. Graham, Daniel B. Owens, Róisín M. La Rosa, Sabina Leanti Plichta, Damian R. Bacallado, Sergio Xavier, Ramnik J. bioRxiv Article In metagenomics, the pool of uncharacterized microbial enzymes presents a challenge for functional annotation. Among these, carbohydrate-active enzymes (CAZymes) stand out due to their pivotal roles in various biological processes related to host health and nutrition. Here, we present CAZyLingua, the first tool that harnesses protein language model embeddings to build a deep learning framework that facilitates the annotation of CAZymes in metagenomic datasets. Our benchmarking results showed on average a higher F1 score (reflecting an average of precision and recall) on the annotated genomes of Bacteroides thetaiotaomicron, Eggerthella lenta and Ruminococcus gnavus compared to the traditional sequence homology-based method in dbCAN2. We applied our tool to a paired mother/infant longitudinal dataset and revealed unannotated CAZymes linked to microbial development during infancy. When applied to metagenomic datasets derived from patients affected by fibrosis-prone diseases such as Crohn’s disease and IgG4-related disease, CAZyLingua uncovered CAZymes associated with disease and healthy states. In each of these metagenomic catalogs, CAZyLingua discovered new annotations that were previously overlooked by traditional sequence homology tools. Overall, the deep learning model CAZyLingua can be applied in combination with existing tools to unravel intricate CAZyme evolutionary profiles and patterns, contributing to a more comprehensive understanding of microbial metabolic dynamics. Cold Spring Harbor Laboratory 2023-10-25 /pmc/articles/PMC10634757/ /pubmed/37961379 http://dx.doi.org/10.1101/2023.10.23.563620 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nc-nd/4.0/) , which allows reusers to copy and distribute the material in any medium or format in unadapted form only, for noncommercial purposes only, and only so long as attribution is given to the creator. |
spellingShingle | Article Thurimella, Kumar Mohamed, Ahmed M. T. Graham, Daniel B. Owens, Róisín M. La Rosa, Sabina Leanti Plichta, Damian R. Bacallado, Sergio Xavier, Ramnik J. Protein Language Models Uncover Carbohydrate-Active Enzyme Function in Metagenomics |
title | Protein Language Models Uncover Carbohydrate-Active Enzyme Function in Metagenomics |
title_full | Protein Language Models Uncover Carbohydrate-Active Enzyme Function in Metagenomics |
title_fullStr | Protein Language Models Uncover Carbohydrate-Active Enzyme Function in Metagenomics |
title_full_unstemmed | Protein Language Models Uncover Carbohydrate-Active Enzyme Function in Metagenomics |
title_short | Protein Language Models Uncover Carbohydrate-Active Enzyme Function in Metagenomics |
title_sort | protein language models uncover carbohydrate-active enzyme function in metagenomics |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10634757/ https://www.ncbi.nlm.nih.gov/pubmed/37961379 http://dx.doi.org/10.1101/2023.10.23.563620 |
work_keys_str_mv | AT thurimellakumar proteinlanguagemodelsuncovercarbohydrateactiveenzymefunctioninmetagenomics AT mohamedahmedmt proteinlanguagemodelsuncovercarbohydrateactiveenzymefunctioninmetagenomics AT grahamdanielb proteinlanguagemodelsuncovercarbohydrateactiveenzymefunctioninmetagenomics AT owensroisinm proteinlanguagemodelsuncovercarbohydrateactiveenzymefunctioninmetagenomics AT larosasabinaleanti proteinlanguagemodelsuncovercarbohydrateactiveenzymefunctioninmetagenomics AT plichtadamianr proteinlanguagemodelsuncovercarbohydrateactiveenzymefunctioninmetagenomics AT bacalladosergio proteinlanguagemodelsuncovercarbohydrateactiveenzymefunctioninmetagenomics AT xavierramnikj proteinlanguagemodelsuncovercarbohydrateactiveenzymefunctioninmetagenomics |