Cargando…

Deep embeddings to comprehend and visualize microbiome protein space

Understanding the function of microbial proteins is essential to reveal the clinical potential of the microbiome. The application of high-throughput sequencing technologies allows for fast and increasingly cheaper acquisition of data from microbial communities. However, many of the inferred protein...

Descripción completa

Detalles Bibliográficos
Autores principales: Odrzywolek, Krzysztof, Karwowska, Zuzanna, Majta, Jan, Byrski, Aleksander, Milanowska-Zabel, Kaja, Kosciolek, Tomasz
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9209496/
https://www.ncbi.nlm.nih.gov/pubmed/35725732
http://dx.doi.org/10.1038/s41598-022-14055-7
_version_ 1784729968813539328
author Odrzywolek, Krzysztof
Karwowska, Zuzanna
Majta, Jan
Byrski, Aleksander
Milanowska-Zabel, Kaja
Kosciolek, Tomasz
author_facet Odrzywolek, Krzysztof
Karwowska, Zuzanna
Majta, Jan
Byrski, Aleksander
Milanowska-Zabel, Kaja
Kosciolek, Tomasz
author_sort Odrzywolek, Krzysztof
collection PubMed
description Understanding the function of microbial proteins is essential to reveal the clinical potential of the microbiome. The application of high-throughput sequencing technologies allows for fast and increasingly cheaper acquisition of data from microbial communities. However, many of the inferred protein sequences are novel and not catalogued, hence the possibility of predicting their function through conventional homology-based approaches is limited, which indicates the need for further research on alignment-free methods. Here, we leverage a deep-learning-based representation of proteins to assess its utility in alignment-free analysis of microbial proteins. We trained a language model on the Unified Human Gastrointestinal Protein catalogue and validated the resulting protein representation on the bacterial part of the SwissProt database. Finally, we present a use case on proteins involved in SCFA metabolism. Results indicate that the deep learning model manages to accurately represent features related to protein structure and function, allowing for alignment-free protein analyses. Technologies that contextualize metagenomic data are a promising direction to deeply understand the microbiome.
format Online
Article
Text
id pubmed-9209496
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-92094962022-06-22 Deep embeddings to comprehend and visualize microbiome protein space Odrzywolek, Krzysztof Karwowska, Zuzanna Majta, Jan Byrski, Aleksander Milanowska-Zabel, Kaja Kosciolek, Tomasz Sci Rep Article Understanding the function of microbial proteins is essential to reveal the clinical potential of the microbiome. The application of high-throughput sequencing technologies allows for fast and increasingly cheaper acquisition of data from microbial communities. However, many of the inferred protein sequences are novel and not catalogued, hence the possibility of predicting their function through conventional homology-based approaches is limited, which indicates the need for further research on alignment-free methods. Here, we leverage a deep-learning-based representation of proteins to assess its utility in alignment-free analysis of microbial proteins. We trained a language model on the Unified Human Gastrointestinal Protein catalogue and validated the resulting protein representation on the bacterial part of the SwissProt database. Finally, we present a use case on proteins involved in SCFA metabolism. Results indicate that the deep learning model manages to accurately represent features related to protein structure and function, allowing for alignment-free protein analyses. Technologies that contextualize metagenomic data are a promising direction to deeply understand the microbiome. Nature Publishing Group UK 2022-06-20 /pmc/articles/PMC9209496/ /pubmed/35725732 http://dx.doi.org/10.1038/s41598-022-14055-7 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Odrzywolek, Krzysztof
Karwowska, Zuzanna
Majta, Jan
Byrski, Aleksander
Milanowska-Zabel, Kaja
Kosciolek, Tomasz
Deep embeddings to comprehend and visualize microbiome protein space
title Deep embeddings to comprehend and visualize microbiome protein space
title_full Deep embeddings to comprehend and visualize microbiome protein space
title_fullStr Deep embeddings to comprehend and visualize microbiome protein space
title_full_unstemmed Deep embeddings to comprehend and visualize microbiome protein space
title_short Deep embeddings to comprehend and visualize microbiome protein space
title_sort deep embeddings to comprehend and visualize microbiome protein space
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9209496/
https://www.ncbi.nlm.nih.gov/pubmed/35725732
http://dx.doi.org/10.1038/s41598-022-14055-7
work_keys_str_mv AT odrzywolekkrzysztof deepembeddingstocomprehendandvisualizemicrobiomeproteinspace
AT karwowskazuzanna deepembeddingstocomprehendandvisualizemicrobiomeproteinspace
AT majtajan deepembeddingstocomprehendandvisualizemicrobiomeproteinspace
AT byrskialeksander deepembeddingstocomprehendandvisualizemicrobiomeproteinspace
AT milanowskazabelkaja deepembeddingstocomprehendandvisualizemicrobiomeproteinspace
AT kosciolektomasz deepembeddingstocomprehendandvisualizemicrobiomeproteinspace