Cargando…

BCDForest: a boosting cascade deep forest model towards the classification of cancer subtypes based on gene expression data

BACKGROUND: The classification of cancer subtypes is of great importance to cancer disease diagnosis and therapy. Many supervised learning approaches have been applied to cancer subtype classification in the past few years, especially of deep learning based approaches. Recently, the deep forest mode...

Descripción completa

Detalles Bibliográficos
Autores principales: Guo, Yang, Liu, Shuhui, Li, Zhanhuai, Shang, Xuequn
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5907304/
https://www.ncbi.nlm.nih.gov/pubmed/29671390
http://dx.doi.org/10.1186/s12859-018-2095-4
_version_ 1783315504042082304
author Guo, Yang
Liu, Shuhui
Li, Zhanhuai
Shang, Xuequn
author_facet Guo, Yang
Liu, Shuhui
Li, Zhanhuai
Shang, Xuequn
author_sort Guo, Yang
collection PubMed
description BACKGROUND: The classification of cancer subtypes is of great importance to cancer disease diagnosis and therapy. Many supervised learning approaches have been applied to cancer subtype classification in the past few years, especially of deep learning based approaches. Recently, the deep forest model has been proposed as an alternative of deep neural networks to learn hyper-representations by using cascade ensemble decision trees. It has been proved that the deep forest model has competitive or even better performance than deep neural networks in some extent. However, the standard deep forest model may face overfitting and ensemble diversity challenges when dealing with small sample size and high-dimensional biology data. RESULTS: In this paper, we propose a deep learning model, so-called BCDForest, to address cancer subtype classification on small-scale biology datasets, which can be viewed as a modification of the standard deep forest model. The BCDForest distinguishes from the standard deep forest model with the following two main contributions: First, a named multi-class-grained scanning method is proposed to train multiple binary classifiers to encourage diversity of ensemble. Meanwhile, the fitting quality of each classifier is considered in representation learning. Second, we propose a boosting strategy to emphasize more important features in cascade forests, thus to propagate the benefits of discriminative features among cascade layers to improve the classification performance. Systematic comparison experiments on both microarray and RNA-Seq gene expression datasets demonstrate that our method consistently outperforms the state-of-the-art methods in application of cancer subtype classification. CONCLUSIONS: The multi-class-grained scanning and boosting strategy in our model provide an effective solution to ease the overfitting challenge and improve the robustness of deep forest model working on small-scale data. Our model provides a useful approach to the classification of cancer subtypes by using deep learning on high-dimensional and small-scale biology data.
format Online
Article
Text
id pubmed-5907304
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-59073042018-04-30 BCDForest: a boosting cascade deep forest model towards the classification of cancer subtypes based on gene expression data Guo, Yang Liu, Shuhui Li, Zhanhuai Shang, Xuequn BMC Bioinformatics Research BACKGROUND: The classification of cancer subtypes is of great importance to cancer disease diagnosis and therapy. Many supervised learning approaches have been applied to cancer subtype classification in the past few years, especially of deep learning based approaches. Recently, the deep forest model has been proposed as an alternative of deep neural networks to learn hyper-representations by using cascade ensemble decision trees. It has been proved that the deep forest model has competitive or even better performance than deep neural networks in some extent. However, the standard deep forest model may face overfitting and ensemble diversity challenges when dealing with small sample size and high-dimensional biology data. RESULTS: In this paper, we propose a deep learning model, so-called BCDForest, to address cancer subtype classification on small-scale biology datasets, which can be viewed as a modification of the standard deep forest model. The BCDForest distinguishes from the standard deep forest model with the following two main contributions: First, a named multi-class-grained scanning method is proposed to train multiple binary classifiers to encourage diversity of ensemble. Meanwhile, the fitting quality of each classifier is considered in representation learning. Second, we propose a boosting strategy to emphasize more important features in cascade forests, thus to propagate the benefits of discriminative features among cascade layers to improve the classification performance. Systematic comparison experiments on both microarray and RNA-Seq gene expression datasets demonstrate that our method consistently outperforms the state-of-the-art methods in application of cancer subtype classification. CONCLUSIONS: The multi-class-grained scanning and boosting strategy in our model provide an effective solution to ease the overfitting challenge and improve the robustness of deep forest model working on small-scale data. Our model provides a useful approach to the classification of cancer subtypes by using deep learning on high-dimensional and small-scale biology data. BioMed Central 2018-04-11 /pmc/articles/PMC5907304/ /pubmed/29671390 http://dx.doi.org/10.1186/s12859-018-2095-4 Text en © The Author(s). 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Guo, Yang
Liu, Shuhui
Li, Zhanhuai
Shang, Xuequn
BCDForest: a boosting cascade deep forest model towards the classification of cancer subtypes based on gene expression data
title BCDForest: a boosting cascade deep forest model towards the classification of cancer subtypes based on gene expression data
title_full BCDForest: a boosting cascade deep forest model towards the classification of cancer subtypes based on gene expression data
title_fullStr BCDForest: a boosting cascade deep forest model towards the classification of cancer subtypes based on gene expression data
title_full_unstemmed BCDForest: a boosting cascade deep forest model towards the classification of cancer subtypes based on gene expression data
title_short BCDForest: a boosting cascade deep forest model towards the classification of cancer subtypes based on gene expression data
title_sort bcdforest: a boosting cascade deep forest model towards the classification of cancer subtypes based on gene expression data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5907304/
https://www.ncbi.nlm.nih.gov/pubmed/29671390
http://dx.doi.org/10.1186/s12859-018-2095-4
work_keys_str_mv AT guoyang bcdforestaboostingcascadedeepforestmodeltowardstheclassificationofcancersubtypesbasedongeneexpressiondata
AT liushuhui bcdforestaboostingcascadedeepforestmodeltowardstheclassificationofcancersubtypesbasedongeneexpressiondata
AT lizhanhuai bcdforestaboostingcascadedeepforestmodeltowardstheclassificationofcancersubtypesbasedongeneexpressiondata
AT shangxuequn bcdforestaboostingcascadedeepforestmodeltowardstheclassificationofcancersubtypesbasedongeneexpressiondata