Cargando…

Kernel principal components based cascade forest towards disease identification with human microbiota

BACKGROUND: Numerous pieces of clinical evidence have shown that many phenotypic traits of human disease are related to their gut microbiome, i.e., inflammation, obesity, HIV, and diabetes. Through supervised classification, it is feasible to determine the human disease states by revealing the intes...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhou, Jiayu, Ye, Yanqing, Jiang, Jiang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8697468/
https://www.ncbi.nlm.nih.gov/pubmed/34949186
http://dx.doi.org/10.1186/s12911-021-01705-5
_version_ 1784620053036007424
author Zhou, Jiayu
Ye, Yanqing
Jiang, Jiang
author_facet Zhou, Jiayu
Ye, Yanqing
Jiang, Jiang
author_sort Zhou, Jiayu
collection PubMed
description BACKGROUND: Numerous pieces of clinical evidence have shown that many phenotypic traits of human disease are related to their gut microbiome, i.e., inflammation, obesity, HIV, and diabetes. Through supervised classification, it is feasible to determine the human disease states by revealing the intestinal microbiota compositional information. However, the abundance matrix of microbiome data is so sparse, an interpretable deep model is crucial to further represent and mine the data for expansion, such as the deep forest model. What’s more, overfitting can still exist in the original deep forest model when dealing with such “large p, small n” biology data. Feature reduction is considered to improve the ensemble forest model especially towards the disease identification in the human microbiota. METHODS: In this work, we propose the kernel principal components based cascade forest method, so-called KPCCF, to classify the disease states of patients by using taxonomic profiles of the microbiome at the family level. In detail, the kernel principal components analysis method is first used to reduce the original dimension of human microbiota datasets. Besides, the processed data is fed into the cascade forest to preliminarily discriminate against the disease state of the samples. RESULTS: The proposed KPCCF algorithm can represent the small-scale and high-dimension human microbiota datasets with the sparse feature matrix. Systematic comparison experiments demonstrate that our method consistently outperforms the state-of-the-art methods with the comparative study on 4 datasets. CONCLUSION: Despite sharing some common characteristics, a one-size-fits-all solution does not exist in any space. The traditional depth model has limitations in the biological application of the unbalanced scale between small samples and high dimensions. KPCCF distinguishes from the standard deep forest model for its excellent performance in the microbiota field. Additionally, compared to other dimensionality reduction methods, the kernel principal components analysis method is more suitable for microbiota datasets.
format Online
Article
Text
id pubmed-8697468
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-86974682022-01-05 Kernel principal components based cascade forest towards disease identification with human microbiota Zhou, Jiayu Ye, Yanqing Jiang, Jiang BMC Med Inform Decis Mak Research Article BACKGROUND: Numerous pieces of clinical evidence have shown that many phenotypic traits of human disease are related to their gut microbiome, i.e., inflammation, obesity, HIV, and diabetes. Through supervised classification, it is feasible to determine the human disease states by revealing the intestinal microbiota compositional information. However, the abundance matrix of microbiome data is so sparse, an interpretable deep model is crucial to further represent and mine the data for expansion, such as the deep forest model. What’s more, overfitting can still exist in the original deep forest model when dealing with such “large p, small n” biology data. Feature reduction is considered to improve the ensemble forest model especially towards the disease identification in the human microbiota. METHODS: In this work, we propose the kernel principal components based cascade forest method, so-called KPCCF, to classify the disease states of patients by using taxonomic profiles of the microbiome at the family level. In detail, the kernel principal components analysis method is first used to reduce the original dimension of human microbiota datasets. Besides, the processed data is fed into the cascade forest to preliminarily discriminate against the disease state of the samples. RESULTS: The proposed KPCCF algorithm can represent the small-scale and high-dimension human microbiota datasets with the sparse feature matrix. Systematic comparison experiments demonstrate that our method consistently outperforms the state-of-the-art methods with the comparative study on 4 datasets. CONCLUSION: Despite sharing some common characteristics, a one-size-fits-all solution does not exist in any space. The traditional depth model has limitations in the biological application of the unbalanced scale between small samples and high dimensions. KPCCF distinguishes from the standard deep forest model for its excellent performance in the microbiota field. Additionally, compared to other dimensionality reduction methods, the kernel principal components analysis method is more suitable for microbiota datasets. BioMed Central 2021-12-23 /pmc/articles/PMC8697468/ /pubmed/34949186 http://dx.doi.org/10.1186/s12911-021-01705-5 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research Article
Zhou, Jiayu
Ye, Yanqing
Jiang, Jiang
Kernel principal components based cascade forest towards disease identification with human microbiota
title Kernel principal components based cascade forest towards disease identification with human microbiota
title_full Kernel principal components based cascade forest towards disease identification with human microbiota
title_fullStr Kernel principal components based cascade forest towards disease identification with human microbiota
title_full_unstemmed Kernel principal components based cascade forest towards disease identification with human microbiota
title_short Kernel principal components based cascade forest towards disease identification with human microbiota
title_sort kernel principal components based cascade forest towards disease identification with human microbiota
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8697468/
https://www.ncbi.nlm.nih.gov/pubmed/34949186
http://dx.doi.org/10.1186/s12911-021-01705-5
work_keys_str_mv AT zhoujiayu kernelprincipalcomponentsbasedcascadeforesttowardsdiseaseidentificationwithhumanmicrobiota
AT yeyanqing kernelprincipalcomponentsbasedcascadeforesttowardsdiseaseidentificationwithhumanmicrobiota
AT jiangjiang kernelprincipalcomponentsbasedcascadeforesttowardsdiseaseidentificationwithhumanmicrobiota