Cargando…
Increasing prediction performance of colorectal cancer disease status using random forests classification based on metagenomic shotgun sequencing data
Dysfunction of microbial communities in various human body sites has been shown to be associated with a variety of diseases raising the possibility of predicting diseases based on metagenomic samples. Although many studies have investigated this problem, there are no consensus on the optimal approac...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
KeAi Publishing
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8801753/ https://www.ncbi.nlm.nih.gov/pubmed/35155839 http://dx.doi.org/10.1016/j.synbio.2022.01.005 |
_version_ | 1784642532840308736 |
---|---|
author | Gao, Yilin Zhu, Zifan Sun, Fengzhu |
author_facet | Gao, Yilin Zhu, Zifan Sun, Fengzhu |
author_sort | Gao, Yilin |
collection | PubMed |
description | Dysfunction of microbial communities in various human body sites has been shown to be associated with a variety of diseases raising the possibility of predicting diseases based on metagenomic samples. Although many studies have investigated this problem, there are no consensus on the optimal approaches for predicting disease status based on metagenomic samples. Using six human gut metagenomic datasets consisting of large numbers of colorectal cancer patients and healthy controls from different countries, we investigated different software packages for extracting relative abundances of known microbial genomes and for integrating mapping and assembly approaches to obtain the relative abundance profiles of both known and novel genomes. The random forests (RF) classification algorithm was then used to predict colorectal cancer status based on the microbial relative abundance profiles. Based on within data cross-validation and cross-dataset prediction, we show that the RF prediction performance using the microbial relative abundance profiles estimated by Centrifuge is generally higher than that using the microbial relative abundance profiles estimated by MetaPhlAn2 and Bracken. We also develop a novel method to integrate the relative abundance profiles of both known and novel microbial organisms to further increase the prediction performance for colorectal cancer from metagenomes. |
format | Online Article Text |
id | pubmed-8801753 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | KeAi Publishing |
record_format | MEDLINE/PubMed |
spelling | pubmed-88017532022-02-11 Increasing prediction performance of colorectal cancer disease status using random forests classification based on metagenomic shotgun sequencing data Gao, Yilin Zhu, Zifan Sun, Fengzhu Synth Syst Biotechnol Original Research Article Dysfunction of microbial communities in various human body sites has been shown to be associated with a variety of diseases raising the possibility of predicting diseases based on metagenomic samples. Although many studies have investigated this problem, there are no consensus on the optimal approaches for predicting disease status based on metagenomic samples. Using six human gut metagenomic datasets consisting of large numbers of colorectal cancer patients and healthy controls from different countries, we investigated different software packages for extracting relative abundances of known microbial genomes and for integrating mapping and assembly approaches to obtain the relative abundance profiles of both known and novel genomes. The random forests (RF) classification algorithm was then used to predict colorectal cancer status based on the microbial relative abundance profiles. Based on within data cross-validation and cross-dataset prediction, we show that the RF prediction performance using the microbial relative abundance profiles estimated by Centrifuge is generally higher than that using the microbial relative abundance profiles estimated by MetaPhlAn2 and Bracken. We also develop a novel method to integrate the relative abundance profiles of both known and novel microbial organisms to further increase the prediction performance for colorectal cancer from metagenomes. KeAi Publishing 2022-01-27 /pmc/articles/PMC8801753/ /pubmed/35155839 http://dx.doi.org/10.1016/j.synbio.2022.01.005 Text en © 2022 The Authors https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). |
spellingShingle | Original Research Article Gao, Yilin Zhu, Zifan Sun, Fengzhu Increasing prediction performance of colorectal cancer disease status using random forests classification based on metagenomic shotgun sequencing data |
title | Increasing prediction performance of colorectal cancer disease status using random forests classification based on metagenomic shotgun sequencing data |
title_full | Increasing prediction performance of colorectal cancer disease status using random forests classification based on metagenomic shotgun sequencing data |
title_fullStr | Increasing prediction performance of colorectal cancer disease status using random forests classification based on metagenomic shotgun sequencing data |
title_full_unstemmed | Increasing prediction performance of colorectal cancer disease status using random forests classification based on metagenomic shotgun sequencing data |
title_short | Increasing prediction performance of colorectal cancer disease status using random forests classification based on metagenomic shotgun sequencing data |
title_sort | increasing prediction performance of colorectal cancer disease status using random forests classification based on metagenomic shotgun sequencing data |
topic | Original Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8801753/ https://www.ncbi.nlm.nih.gov/pubmed/35155839 http://dx.doi.org/10.1016/j.synbio.2022.01.005 |
work_keys_str_mv | AT gaoyilin increasingpredictionperformanceofcolorectalcancerdiseasestatususingrandomforestsclassificationbasedonmetagenomicshotgunsequencingdata AT zhuzifan increasingpredictionperformanceofcolorectalcancerdiseasestatususingrandomforestsclassificationbasedonmetagenomicshotgunsequencingdata AT sunfengzhu increasingpredictionperformanceofcolorectalcancerdiseasestatususingrandomforestsclassificationbasedonmetagenomicshotgunsequencingdata |