Cargando…
Multiclass Disease Classification from Microbial Whole-Community Metagenomes
The microbiome, the community of microorganisms living within an individual, is a promising avenue for developing non-invasive methods for disease screening and diagnosis. Here, we utilize 5643 aggregated, annotated whole-community metagenomes to implement the first multiclass microbiome disease cla...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7120658/ https://www.ncbi.nlm.nih.gov/pubmed/31797586 |
_version_ | 1783515023298002944 |
---|---|
author | Khan, Saad Kelly, Libusha |
author_facet | Khan, Saad Kelly, Libusha |
author_sort | Khan, Saad |
collection | PubMed |
description | The microbiome, the community of microorganisms living within an individual, is a promising avenue for developing non-invasive methods for disease screening and diagnosis. Here, we utilize 5643 aggregated, annotated whole-community metagenomes to implement the first multiclass microbiome disease classifier of this scale, able to discriminate between 18 different diseases and healthy. We compared three different machine learning models: random forests, deep neural nets, and a novel graph convolutional architecture which exploits the graph structure of phylogenetic trees as its input. We show that the graph convolutional model outperforms deep neural nets in terms of accuracy (achieving 75% average test-set accuracy), receiver-operator-characteristics (92.1% average area-under-ROC (AUC)), and precision-recall (50% average area-under-precision-recall (AUPR)). Additionally, the convolutional net’s performance complements that of the random forest, showing a lower propensity for Type-I errors (false-positives) while the random forest makes less Type-II errors (false-negatives). Lastly, we are able to achieve over 90% average top-3 accuracy across all of our models. Together, these results indicate that there are predictive, disease-specific signatures across microbiomes that can be used for diagnostic purposes. |
format | Online Article Text |
id | pubmed-7120658 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
record_format | MEDLINE/PubMed |
spelling | pubmed-71206582020-04-03 Multiclass Disease Classification from Microbial Whole-Community Metagenomes Khan, Saad Kelly, Libusha Pac Symp Biocomput Article The microbiome, the community of microorganisms living within an individual, is a promising avenue for developing non-invasive methods for disease screening and diagnosis. Here, we utilize 5643 aggregated, annotated whole-community metagenomes to implement the first multiclass microbiome disease classifier of this scale, able to discriminate between 18 different diseases and healthy. We compared three different machine learning models: random forests, deep neural nets, and a novel graph convolutional architecture which exploits the graph structure of phylogenetic trees as its input. We show that the graph convolutional model outperforms deep neural nets in terms of accuracy (achieving 75% average test-set accuracy), receiver-operator-characteristics (92.1% average area-under-ROC (AUC)), and precision-recall (50% average area-under-precision-recall (AUPR)). Additionally, the convolutional net’s performance complements that of the random forest, showing a lower propensity for Type-I errors (false-positives) while the random forest makes less Type-II errors (false-negatives). Lastly, we are able to achieve over 90% average top-3 accuracy across all of our models. Together, these results indicate that there are predictive, disease-specific signatures across microbiomes that can be used for diagnostic purposes. 2020 /pmc/articles/PMC7120658/ /pubmed/31797586 Text en http://creativecommons.org/licenses/by-nc-nd/4.0/ Open Access chapter published by World Scientific Publishing Company and distributed under the terms of the Creative Commons Attribution Non-Commercial (CC BY-NC) 4.0 License |
spellingShingle | Article Khan, Saad Kelly, Libusha Multiclass Disease Classification from Microbial Whole-Community Metagenomes |
title | Multiclass Disease Classification from Microbial Whole-Community Metagenomes |
title_full | Multiclass Disease Classification from Microbial Whole-Community Metagenomes |
title_fullStr | Multiclass Disease Classification from Microbial Whole-Community Metagenomes |
title_full_unstemmed | Multiclass Disease Classification from Microbial Whole-Community Metagenomes |
title_short | Multiclass Disease Classification from Microbial Whole-Community Metagenomes |
title_sort | multiclass disease classification from microbial whole-community metagenomes |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7120658/ https://www.ncbi.nlm.nih.gov/pubmed/31797586 |
work_keys_str_mv | AT khansaad multiclassdiseaseclassificationfrommicrobialwholecommunitymetagenomes AT kellylibusha multiclassdiseaseclassificationfrommicrobialwholecommunitymetagenomes |