Cargando…
Hierarchical Microbial Functions Prediction by Graph Aggregated Embedding
Matching 16S rRNA gene sequencing data to a metabolic reference database is a meaningful way to predict the metabolic function of bacteria and archaea, bringing greater insight to the working of the microbial community. However, some operational taxonomy units (OTUs) cannot be functionally profiled,...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7874084/ https://www.ncbi.nlm.nih.gov/pubmed/33584804 http://dx.doi.org/10.3389/fgene.2020.608512 |
_version_ | 1783649515802198016 |
---|---|
author | Hou, Yujie Zhang, Xiong Zhou, Qinyan Hong, Wenxing Wang, Ying |
author_facet | Hou, Yujie Zhang, Xiong Zhou, Qinyan Hong, Wenxing Wang, Ying |
author_sort | Hou, Yujie |
collection | PubMed |
description | Matching 16S rRNA gene sequencing data to a metabolic reference database is a meaningful way to predict the metabolic function of bacteria and archaea, bringing greater insight to the working of the microbial community. However, some operational taxonomy units (OTUs) cannot be functionally profiled, especially for microbial communities from non-human samples cultured in defective media. Therefore, we herein report the development of Hierarchical micrObial functions Prediction by graph aggregated Embedding (HOPE), which utilizes co-occurring patterns and nucleotide sequences to predict microbial functions. HOPE integrates topological structures of microbial co-occurrence networks with k-mer compositions of OTU sequences and embeds them into a lower-dimensional continuous latent space, while maximally preserving topological relationships among OTUs. The high imbalance among KEGG Orthology (KO) functions of microbes is recognized in our framework that usually yields poor performance. A hierarchical multitask learning module is used in HOPE to alleviate the challenge brought by the long-tailed distribution among classes. To test the performance of HOPE, we compare it with HOPE-one, HOPE-seq, and GraphSAGE, respectively, in three microbial metagenomic 16s rRNA sequencing datasets, including abalone gut, human gut, and gut of Penaeus monodon. Experiments demonstrate that HOPE outperforms baselines on almost all indexes in all experiments. Furthermore, HOPE reveals significant generalization ability. HOPE's basic idea is suitable for other related scenarios, such as the prediction of gene function based on gene co-expression networks. The source code of HOPE is freely available at https://github.com/adrift00/HOPE. |
format | Online Article Text |
id | pubmed-7874084 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-78740842021-02-11 Hierarchical Microbial Functions Prediction by Graph Aggregated Embedding Hou, Yujie Zhang, Xiong Zhou, Qinyan Hong, Wenxing Wang, Ying Front Genet Genetics Matching 16S rRNA gene sequencing data to a metabolic reference database is a meaningful way to predict the metabolic function of bacteria and archaea, bringing greater insight to the working of the microbial community. However, some operational taxonomy units (OTUs) cannot be functionally profiled, especially for microbial communities from non-human samples cultured in defective media. Therefore, we herein report the development of Hierarchical micrObial functions Prediction by graph aggregated Embedding (HOPE), which utilizes co-occurring patterns and nucleotide sequences to predict microbial functions. HOPE integrates topological structures of microbial co-occurrence networks with k-mer compositions of OTU sequences and embeds them into a lower-dimensional continuous latent space, while maximally preserving topological relationships among OTUs. The high imbalance among KEGG Orthology (KO) functions of microbes is recognized in our framework that usually yields poor performance. A hierarchical multitask learning module is used in HOPE to alleviate the challenge brought by the long-tailed distribution among classes. To test the performance of HOPE, we compare it with HOPE-one, HOPE-seq, and GraphSAGE, respectively, in three microbial metagenomic 16s rRNA sequencing datasets, including abalone gut, human gut, and gut of Penaeus monodon. Experiments demonstrate that HOPE outperforms baselines on almost all indexes in all experiments. Furthermore, HOPE reveals significant generalization ability. HOPE's basic idea is suitable for other related scenarios, such as the prediction of gene function based on gene co-expression networks. The source code of HOPE is freely available at https://github.com/adrift00/HOPE. Frontiers Media S.A. 2021-01-18 /pmc/articles/PMC7874084/ /pubmed/33584804 http://dx.doi.org/10.3389/fgene.2020.608512 Text en Copyright © 2021 Hou, Zhang, Zhou, Hong and Wang. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Genetics Hou, Yujie Zhang, Xiong Zhou, Qinyan Hong, Wenxing Wang, Ying Hierarchical Microbial Functions Prediction by Graph Aggregated Embedding |
title | Hierarchical Microbial Functions Prediction by Graph Aggregated Embedding |
title_full | Hierarchical Microbial Functions Prediction by Graph Aggregated Embedding |
title_fullStr | Hierarchical Microbial Functions Prediction by Graph Aggregated Embedding |
title_full_unstemmed | Hierarchical Microbial Functions Prediction by Graph Aggregated Embedding |
title_short | Hierarchical Microbial Functions Prediction by Graph Aggregated Embedding |
title_sort | hierarchical microbial functions prediction by graph aggregated embedding |
topic | Genetics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7874084/ https://www.ncbi.nlm.nih.gov/pubmed/33584804 http://dx.doi.org/10.3389/fgene.2020.608512 |
work_keys_str_mv | AT houyujie hierarchicalmicrobialfunctionspredictionbygraphaggregatedembedding AT zhangxiong hierarchicalmicrobialfunctionspredictionbygraphaggregatedembedding AT zhouqinyan hierarchicalmicrobialfunctionspredictionbygraphaggregatedembedding AT hongwenxing hierarchicalmicrobialfunctionspredictionbygraphaggregatedembedding AT wangying hierarchicalmicrobialfunctionspredictionbygraphaggregatedembedding |