Cargando…

Multiclass Disease Classification from Microbial Whole-Community Metagenomes

The microbiome, the community of microorganisms living within an individual, is a promising avenue for developing non-invasive methods for disease screening and diagnosis. Here, we utilize 5643 aggregated, annotated whole-community metagenomes to implement the first multiclass microbiome disease cla...

Descripción completa

Detalles Bibliográficos
Autores principales: Khan, Saad, Kelly, Libusha
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7120658/
https://www.ncbi.nlm.nih.gov/pubmed/31797586
_version_ 1783515023298002944
author Khan, Saad
Kelly, Libusha
author_facet Khan, Saad
Kelly, Libusha
author_sort Khan, Saad
collection PubMed
description The microbiome, the community of microorganisms living within an individual, is a promising avenue for developing non-invasive methods for disease screening and diagnosis. Here, we utilize 5643 aggregated, annotated whole-community metagenomes to implement the first multiclass microbiome disease classifier of this scale, able to discriminate between 18 different diseases and healthy. We compared three different machine learning models: random forests, deep neural nets, and a novel graph convolutional architecture which exploits the graph structure of phylogenetic trees as its input. We show that the graph convolutional model outperforms deep neural nets in terms of accuracy (achieving 75% average test-set accuracy), receiver-operator-characteristics (92.1% average area-under-ROC (AUC)), and precision-recall (50% average area-under-precision-recall (AUPR)). Additionally, the convolutional net’s performance complements that of the random forest, showing a lower propensity for Type-I errors (false-positives) while the random forest makes less Type-II errors (false-negatives). Lastly, we are able to achieve over 90% average top-3 accuracy across all of our models. Together, these results indicate that there are predictive, disease-specific signatures across microbiomes that can be used for diagnostic purposes.
format Online
Article
Text
id pubmed-7120658
institution National Center for Biotechnology Information
language English
publishDate 2020
record_format MEDLINE/PubMed
spelling pubmed-71206582020-04-03 Multiclass Disease Classification from Microbial Whole-Community Metagenomes Khan, Saad Kelly, Libusha Pac Symp Biocomput Article The microbiome, the community of microorganisms living within an individual, is a promising avenue for developing non-invasive methods for disease screening and diagnosis. Here, we utilize 5643 aggregated, annotated whole-community metagenomes to implement the first multiclass microbiome disease classifier of this scale, able to discriminate between 18 different diseases and healthy. We compared three different machine learning models: random forests, deep neural nets, and a novel graph convolutional architecture which exploits the graph structure of phylogenetic trees as its input. We show that the graph convolutional model outperforms deep neural nets in terms of accuracy (achieving 75% average test-set accuracy), receiver-operator-characteristics (92.1% average area-under-ROC (AUC)), and precision-recall (50% average area-under-precision-recall (AUPR)). Additionally, the convolutional net’s performance complements that of the random forest, showing a lower propensity for Type-I errors (false-positives) while the random forest makes less Type-II errors (false-negatives). Lastly, we are able to achieve over 90% average top-3 accuracy across all of our models. Together, these results indicate that there are predictive, disease-specific signatures across microbiomes that can be used for diagnostic purposes. 2020 /pmc/articles/PMC7120658/ /pubmed/31797586 Text en http://creativecommons.org/licenses/by-nc-nd/4.0/ Open Access chapter published by World Scientific Publishing Company and distributed under the terms of the Creative Commons Attribution Non-Commercial (CC BY-NC) 4.0 License
spellingShingle Article
Khan, Saad
Kelly, Libusha
Multiclass Disease Classification from Microbial Whole-Community Metagenomes
title Multiclass Disease Classification from Microbial Whole-Community Metagenomes
title_full Multiclass Disease Classification from Microbial Whole-Community Metagenomes
title_fullStr Multiclass Disease Classification from Microbial Whole-Community Metagenomes
title_full_unstemmed Multiclass Disease Classification from Microbial Whole-Community Metagenomes
title_short Multiclass Disease Classification from Microbial Whole-Community Metagenomes
title_sort multiclass disease classification from microbial whole-community metagenomes
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7120658/
https://www.ncbi.nlm.nih.gov/pubmed/31797586
work_keys_str_mv AT khansaad multiclassdiseaseclassificationfrommicrobialwholecommunitymetagenomes
AT kellylibusha multiclassdiseaseclassificationfrommicrobialwholecommunitymetagenomes