Cargando…
Increasing prediction performance of colorectal cancer disease status using random forests classification based on metagenomic shotgun sequencing data
Dysfunction of microbial communities in various human body sites has been shown to be associated with a variety of diseases raising the possibility of predicting diseases based on metagenomic samples. Although many studies have investigated this problem, there are no consensus on the optimal approac...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
KeAi Publishing
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8801753/ https://www.ncbi.nlm.nih.gov/pubmed/35155839 http://dx.doi.org/10.1016/j.synbio.2022.01.005 |
Sumario: | Dysfunction of microbial communities in various human body sites has been shown to be associated with a variety of diseases raising the possibility of predicting diseases based on metagenomic samples. Although many studies have investigated this problem, there are no consensus on the optimal approaches for predicting disease status based on metagenomic samples. Using six human gut metagenomic datasets consisting of large numbers of colorectal cancer patients and healthy controls from different countries, we investigated different software packages for extracting relative abundances of known microbial genomes and for integrating mapping and assembly approaches to obtain the relative abundance profiles of both known and novel genomes. The random forests (RF) classification algorithm was then used to predict colorectal cancer status based on the microbial relative abundance profiles. Based on within data cross-validation and cross-dataset prediction, we show that the RF prediction performance using the microbial relative abundance profiles estimated by Centrifuge is generally higher than that using the microbial relative abundance profiles estimated by MetaPhlAn2 and Bracken. We also develop a novel method to integrate the relative abundance profiles of both known and novel microbial organisms to further increase the prediction performance for colorectal cancer from metagenomes. |
---|