Cargando…
Kernel principal components based cascade forest towards disease identification with human microbiota
BACKGROUND: Numerous pieces of clinical evidence have shown that many phenotypic traits of human disease are related to their gut microbiome, i.e., inflammation, obesity, HIV, and diabetes. Through supervised classification, it is feasible to determine the human disease states by revealing the intes...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8697468/ https://www.ncbi.nlm.nih.gov/pubmed/34949186 http://dx.doi.org/10.1186/s12911-021-01705-5 |
_version_ | 1784620053036007424 |
---|---|
author | Zhou, Jiayu Ye, Yanqing Jiang, Jiang |
author_facet | Zhou, Jiayu Ye, Yanqing Jiang, Jiang |
author_sort | Zhou, Jiayu |
collection | PubMed |
description | BACKGROUND: Numerous pieces of clinical evidence have shown that many phenotypic traits of human disease are related to their gut microbiome, i.e., inflammation, obesity, HIV, and diabetes. Through supervised classification, it is feasible to determine the human disease states by revealing the intestinal microbiota compositional information. However, the abundance matrix of microbiome data is so sparse, an interpretable deep model is crucial to further represent and mine the data for expansion, such as the deep forest model. What’s more, overfitting can still exist in the original deep forest model when dealing with such “large p, small n” biology data. Feature reduction is considered to improve the ensemble forest model especially towards the disease identification in the human microbiota. METHODS: In this work, we propose the kernel principal components based cascade forest method, so-called KPCCF, to classify the disease states of patients by using taxonomic profiles of the microbiome at the family level. In detail, the kernel principal components analysis method is first used to reduce the original dimension of human microbiota datasets. Besides, the processed data is fed into the cascade forest to preliminarily discriminate against the disease state of the samples. RESULTS: The proposed KPCCF algorithm can represent the small-scale and high-dimension human microbiota datasets with the sparse feature matrix. Systematic comparison experiments demonstrate that our method consistently outperforms the state-of-the-art methods with the comparative study on 4 datasets. CONCLUSION: Despite sharing some common characteristics, a one-size-fits-all solution does not exist in any space. The traditional depth model has limitations in the biological application of the unbalanced scale between small samples and high dimensions. KPCCF distinguishes from the standard deep forest model for its excellent performance in the microbiota field. Additionally, compared to other dimensionality reduction methods, the kernel principal components analysis method is more suitable for microbiota datasets. |
format | Online Article Text |
id | pubmed-8697468 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-86974682022-01-05 Kernel principal components based cascade forest towards disease identification with human microbiota Zhou, Jiayu Ye, Yanqing Jiang, Jiang BMC Med Inform Decis Mak Research Article BACKGROUND: Numerous pieces of clinical evidence have shown that many phenotypic traits of human disease are related to their gut microbiome, i.e., inflammation, obesity, HIV, and diabetes. Through supervised classification, it is feasible to determine the human disease states by revealing the intestinal microbiota compositional information. However, the abundance matrix of microbiome data is so sparse, an interpretable deep model is crucial to further represent and mine the data for expansion, such as the deep forest model. What’s more, overfitting can still exist in the original deep forest model when dealing with such “large p, small n” biology data. Feature reduction is considered to improve the ensemble forest model especially towards the disease identification in the human microbiota. METHODS: In this work, we propose the kernel principal components based cascade forest method, so-called KPCCF, to classify the disease states of patients by using taxonomic profiles of the microbiome at the family level. In detail, the kernel principal components analysis method is first used to reduce the original dimension of human microbiota datasets. Besides, the processed data is fed into the cascade forest to preliminarily discriminate against the disease state of the samples. RESULTS: The proposed KPCCF algorithm can represent the small-scale and high-dimension human microbiota datasets with the sparse feature matrix. Systematic comparison experiments demonstrate that our method consistently outperforms the state-of-the-art methods with the comparative study on 4 datasets. CONCLUSION: Despite sharing some common characteristics, a one-size-fits-all solution does not exist in any space. The traditional depth model has limitations in the biological application of the unbalanced scale between small samples and high dimensions. KPCCF distinguishes from the standard deep forest model for its excellent performance in the microbiota field. Additionally, compared to other dimensionality reduction methods, the kernel principal components analysis method is more suitable for microbiota datasets. BioMed Central 2021-12-23 /pmc/articles/PMC8697468/ /pubmed/34949186 http://dx.doi.org/10.1186/s12911-021-01705-5 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Article Zhou, Jiayu Ye, Yanqing Jiang, Jiang Kernel principal components based cascade forest towards disease identification with human microbiota |
title | Kernel principal components based cascade forest towards disease identification with human microbiota |
title_full | Kernel principal components based cascade forest towards disease identification with human microbiota |
title_fullStr | Kernel principal components based cascade forest towards disease identification with human microbiota |
title_full_unstemmed | Kernel principal components based cascade forest towards disease identification with human microbiota |
title_short | Kernel principal components based cascade forest towards disease identification with human microbiota |
title_sort | kernel principal components based cascade forest towards disease identification with human microbiota |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8697468/ https://www.ncbi.nlm.nih.gov/pubmed/34949186 http://dx.doi.org/10.1186/s12911-021-01705-5 |
work_keys_str_mv | AT zhoujiayu kernelprincipalcomponentsbasedcascadeforesttowardsdiseaseidentificationwithhumanmicrobiota AT yeyanqing kernelprincipalcomponentsbasedcascadeforesttowardsdiseaseidentificationwithhumanmicrobiota AT jiangjiang kernelprincipalcomponentsbasedcascadeforesttowardsdiseaseidentificationwithhumanmicrobiota |