Cargando…

Increasing prediction performance of colorectal cancer disease status using random forests classification based on metagenomic shotgun sequencing data

Dysfunction of microbial communities in various human body sites has been shown to be associated with a variety of diseases raising the possibility of predicting diseases based on metagenomic samples. Although many studies have investigated this problem, there are no consensus on the optimal approac...

Descripción completa

Detalles Bibliográficos
Autores principales: Gao, Yilin, Zhu, Zifan, Sun, Fengzhu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: KeAi Publishing 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8801753/
https://www.ncbi.nlm.nih.gov/pubmed/35155839
http://dx.doi.org/10.1016/j.synbio.2022.01.005
_version_ 1784642532840308736
author Gao, Yilin
Zhu, Zifan
Sun, Fengzhu
author_facet Gao, Yilin
Zhu, Zifan
Sun, Fengzhu
author_sort Gao, Yilin
collection PubMed
description Dysfunction of microbial communities in various human body sites has been shown to be associated with a variety of diseases raising the possibility of predicting diseases based on metagenomic samples. Although many studies have investigated this problem, there are no consensus on the optimal approaches for predicting disease status based on metagenomic samples. Using six human gut metagenomic datasets consisting of large numbers of colorectal cancer patients and healthy controls from different countries, we investigated different software packages for extracting relative abundances of known microbial genomes and for integrating mapping and assembly approaches to obtain the relative abundance profiles of both known and novel genomes. The random forests (RF) classification algorithm was then used to predict colorectal cancer status based on the microbial relative abundance profiles. Based on within data cross-validation and cross-dataset prediction, we show that the RF prediction performance using the microbial relative abundance profiles estimated by Centrifuge is generally higher than that using the microbial relative abundance profiles estimated by MetaPhlAn2 and Bracken. We also develop a novel method to integrate the relative abundance profiles of both known and novel microbial organisms to further increase the prediction performance for colorectal cancer from metagenomes.
format Online
Article
Text
id pubmed-8801753
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher KeAi Publishing
record_format MEDLINE/PubMed
spelling pubmed-88017532022-02-11 Increasing prediction performance of colorectal cancer disease status using random forests classification based on metagenomic shotgun sequencing data Gao, Yilin Zhu, Zifan Sun, Fengzhu Synth Syst Biotechnol Original Research Article Dysfunction of microbial communities in various human body sites has been shown to be associated with a variety of diseases raising the possibility of predicting diseases based on metagenomic samples. Although many studies have investigated this problem, there are no consensus on the optimal approaches for predicting disease status based on metagenomic samples. Using six human gut metagenomic datasets consisting of large numbers of colorectal cancer patients and healthy controls from different countries, we investigated different software packages for extracting relative abundances of known microbial genomes and for integrating mapping and assembly approaches to obtain the relative abundance profiles of both known and novel genomes. The random forests (RF) classification algorithm was then used to predict colorectal cancer status based on the microbial relative abundance profiles. Based on within data cross-validation and cross-dataset prediction, we show that the RF prediction performance using the microbial relative abundance profiles estimated by Centrifuge is generally higher than that using the microbial relative abundance profiles estimated by MetaPhlAn2 and Bracken. We also develop a novel method to integrate the relative abundance profiles of both known and novel microbial organisms to further increase the prediction performance for colorectal cancer from metagenomes. KeAi Publishing 2022-01-27 /pmc/articles/PMC8801753/ /pubmed/35155839 http://dx.doi.org/10.1016/j.synbio.2022.01.005 Text en © 2022 The Authors https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Original Research Article
Gao, Yilin
Zhu, Zifan
Sun, Fengzhu
Increasing prediction performance of colorectal cancer disease status using random forests classification based on metagenomic shotgun sequencing data
title Increasing prediction performance of colorectal cancer disease status using random forests classification based on metagenomic shotgun sequencing data
title_full Increasing prediction performance of colorectal cancer disease status using random forests classification based on metagenomic shotgun sequencing data
title_fullStr Increasing prediction performance of colorectal cancer disease status using random forests classification based on metagenomic shotgun sequencing data
title_full_unstemmed Increasing prediction performance of colorectal cancer disease status using random forests classification based on metagenomic shotgun sequencing data
title_short Increasing prediction performance of colorectal cancer disease status using random forests classification based on metagenomic shotgun sequencing data
title_sort increasing prediction performance of colorectal cancer disease status using random forests classification based on metagenomic shotgun sequencing data
topic Original Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8801753/
https://www.ncbi.nlm.nih.gov/pubmed/35155839
http://dx.doi.org/10.1016/j.synbio.2022.01.005
work_keys_str_mv AT gaoyilin increasingpredictionperformanceofcolorectalcancerdiseasestatususingrandomforestsclassificationbasedonmetagenomicshotgunsequencingdata
AT zhuzifan increasingpredictionperformanceofcolorectalcancerdiseasestatususingrandomforestsclassificationbasedonmetagenomicshotgunsequencingdata
AT sunfengzhu increasingpredictionperformanceofcolorectalcancerdiseasestatususingrandomforestsclassificationbasedonmetagenomicshotgunsequencingdata