Cargando…

Hierarchical Microbial Functions Prediction by Graph Aggregated Embedding

Matching 16S rRNA gene sequencing data to a metabolic reference database is a meaningful way to predict the metabolic function of bacteria and archaea, bringing greater insight to the working of the microbial community. However, some operational taxonomy units (OTUs) cannot be functionally profiled,...

Descripción completa

Detalles Bibliográficos
Autores principales: Hou, Yujie, Zhang, Xiong, Zhou, Qinyan, Hong, Wenxing, Wang, Ying
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7874084/
https://www.ncbi.nlm.nih.gov/pubmed/33584804
http://dx.doi.org/10.3389/fgene.2020.608512
_version_ 1783649515802198016
author Hou, Yujie
Zhang, Xiong
Zhou, Qinyan
Hong, Wenxing
Wang, Ying
author_facet Hou, Yujie
Zhang, Xiong
Zhou, Qinyan
Hong, Wenxing
Wang, Ying
author_sort Hou, Yujie
collection PubMed
description Matching 16S rRNA gene sequencing data to a metabolic reference database is a meaningful way to predict the metabolic function of bacteria and archaea, bringing greater insight to the working of the microbial community. However, some operational taxonomy units (OTUs) cannot be functionally profiled, especially for microbial communities from non-human samples cultured in defective media. Therefore, we herein report the development of Hierarchical micrObial functions Prediction by graph aggregated Embedding (HOPE), which utilizes co-occurring patterns and nucleotide sequences to predict microbial functions. HOPE integrates topological structures of microbial co-occurrence networks with k-mer compositions of OTU sequences and embeds them into a lower-dimensional continuous latent space, while maximally preserving topological relationships among OTUs. The high imbalance among KEGG Orthology (KO) functions of microbes is recognized in our framework that usually yields poor performance. A hierarchical multitask learning module is used in HOPE to alleviate the challenge brought by the long-tailed distribution among classes. To test the performance of HOPE, we compare it with HOPE-one, HOPE-seq, and GraphSAGE, respectively, in three microbial metagenomic 16s rRNA sequencing datasets, including abalone gut, human gut, and gut of Penaeus monodon. Experiments demonstrate that HOPE outperforms baselines on almost all indexes in all experiments. Furthermore, HOPE reveals significant generalization ability. HOPE's basic idea is suitable for other related scenarios, such as the prediction of gene function based on gene co-expression networks. The source code of HOPE is freely available at https://github.com/adrift00/HOPE.
format Online
Article
Text
id pubmed-7874084
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-78740842021-02-11 Hierarchical Microbial Functions Prediction by Graph Aggregated Embedding Hou, Yujie Zhang, Xiong Zhou, Qinyan Hong, Wenxing Wang, Ying Front Genet Genetics Matching 16S rRNA gene sequencing data to a metabolic reference database is a meaningful way to predict the metabolic function of bacteria and archaea, bringing greater insight to the working of the microbial community. However, some operational taxonomy units (OTUs) cannot be functionally profiled, especially for microbial communities from non-human samples cultured in defective media. Therefore, we herein report the development of Hierarchical micrObial functions Prediction by graph aggregated Embedding (HOPE), which utilizes co-occurring patterns and nucleotide sequences to predict microbial functions. HOPE integrates topological structures of microbial co-occurrence networks with k-mer compositions of OTU sequences and embeds them into a lower-dimensional continuous latent space, while maximally preserving topological relationships among OTUs. The high imbalance among KEGG Orthology (KO) functions of microbes is recognized in our framework that usually yields poor performance. A hierarchical multitask learning module is used in HOPE to alleviate the challenge brought by the long-tailed distribution among classes. To test the performance of HOPE, we compare it with HOPE-one, HOPE-seq, and GraphSAGE, respectively, in three microbial metagenomic 16s rRNA sequencing datasets, including abalone gut, human gut, and gut of Penaeus monodon. Experiments demonstrate that HOPE outperforms baselines on almost all indexes in all experiments. Furthermore, HOPE reveals significant generalization ability. HOPE's basic idea is suitable for other related scenarios, such as the prediction of gene function based on gene co-expression networks. The source code of HOPE is freely available at https://github.com/adrift00/HOPE. Frontiers Media S.A. 2021-01-18 /pmc/articles/PMC7874084/ /pubmed/33584804 http://dx.doi.org/10.3389/fgene.2020.608512 Text en Copyright © 2021 Hou, Zhang, Zhou, Hong and Wang. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Hou, Yujie
Zhang, Xiong
Zhou, Qinyan
Hong, Wenxing
Wang, Ying
Hierarchical Microbial Functions Prediction by Graph Aggregated Embedding
title Hierarchical Microbial Functions Prediction by Graph Aggregated Embedding
title_full Hierarchical Microbial Functions Prediction by Graph Aggregated Embedding
title_fullStr Hierarchical Microbial Functions Prediction by Graph Aggregated Embedding
title_full_unstemmed Hierarchical Microbial Functions Prediction by Graph Aggregated Embedding
title_short Hierarchical Microbial Functions Prediction by Graph Aggregated Embedding
title_sort hierarchical microbial functions prediction by graph aggregated embedding
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7874084/
https://www.ncbi.nlm.nih.gov/pubmed/33584804
http://dx.doi.org/10.3389/fgene.2020.608512
work_keys_str_mv AT houyujie hierarchicalmicrobialfunctionspredictionbygraphaggregatedembedding
AT zhangxiong hierarchicalmicrobialfunctionspredictionbygraphaggregatedembedding
AT zhouqinyan hierarchicalmicrobialfunctionspredictionbygraphaggregatedembedding
AT hongwenxing hierarchicalmicrobialfunctionspredictionbygraphaggregatedembedding
AT wangying hierarchicalmicrobialfunctionspredictionbygraphaggregatedembedding