Cargando…

Hybrid deep learning approach to improve classification of low-volume high-dimensional data

BACKGROUND: The performance of machine learning classification methods relies heavily on the choice of features. In many domains, feature generation can be labor-intensive and require domain knowledge, and feature selection methods do not scale well in high-dimensional datasets. Deep learning has sh...

Descripción completa

Detalles Bibliográficos
Autores principales:	Mavaie, Pegah, Holder, Lawrence, Skinner, Michael K.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2023
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10631218/ https://www.ncbi.nlm.nih.gov/pubmed/37936066 http://dx.doi.org/10.1186/s12859-023-05557-w

_version_	1785132326704906240
author	Mavaie, Pegah Holder, Lawrence Skinner, Michael K.
author_facet	Mavaie, Pegah Holder, Lawrence Skinner, Michael K.
author_sort	Mavaie, Pegah
collection	PubMed
description	BACKGROUND: The performance of machine learning classification methods relies heavily on the choice of features. In many domains, feature generation can be labor-intensive and require domain knowledge, and feature selection methods do not scale well in high-dimensional datasets. Deep learning has shown success in feature generation but requires large datasets to achieve high classification accuracy. Biology domains typically exhibit these challenges with numerous handcrafted features (high-dimensional) and small amounts of training data (low volume). METHOD: A hybrid learning approach is proposed that first trains a deep network on the training data, extracts features from the deep network, and then uses these features to re-express the data for input to a non-deep learning method, which is trained to perform the final classification. RESULTS: The approach is systematically evaluated to determine the best layer of the deep learning network from which to extract features and the threshold on training data volume that prefers this approach. Results from several domains show that this hybrid approach outperforms standalone deep and non-deep learning methods, especially on low-volume, high-dimensional datasets. The diverse collection of datasets further supports the robustness of the approach across different domains. CONCLUSIONS: The hybrid approach combines the strengths of deep and non-deep learning paradigms to achieve high performance on high-dimensional, low volume learning tasks that are typical in biology domains.
format	Online Article Text
id	pubmed-10631218
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-106312182023-11-07 Hybrid deep learning approach to improve classification of low-volume high-dimensional data Mavaie, Pegah Holder, Lawrence Skinner, Michael K. BMC Bioinformatics Research BACKGROUND: The performance of machine learning classification methods relies heavily on the choice of features. In many domains, feature generation can be labor-intensive and require domain knowledge, and feature selection methods do not scale well in high-dimensional datasets. Deep learning has shown success in feature generation but requires large datasets to achieve high classification accuracy. Biology domains typically exhibit these challenges with numerous handcrafted features (high-dimensional) and small amounts of training data (low volume). METHOD: A hybrid learning approach is proposed that first trains a deep network on the training data, extracts features from the deep network, and then uses these features to re-express the data for input to a non-deep learning method, which is trained to perform the final classification. RESULTS: The approach is systematically evaluated to determine the best layer of the deep learning network from which to extract features and the threshold on training data volume that prefers this approach. Results from several domains show that this hybrid approach outperforms standalone deep and non-deep learning methods, especially on low-volume, high-dimensional datasets. The diverse collection of datasets further supports the robustness of the approach across different domains. CONCLUSIONS: The hybrid approach combines the strengths of deep and non-deep learning paradigms to achieve high performance on high-dimensional, low volume learning tasks that are typical in biology domains. BioMed Central 2023-11-07 /pmc/articles/PMC10631218/ /pubmed/37936066 http://dx.doi.org/10.1186/s12859-023-05557-w Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Research Mavaie, Pegah Holder, Lawrence Skinner, Michael K. Hybrid deep learning approach to improve classification of low-volume high-dimensional data
title	Hybrid deep learning approach to improve classification of low-volume high-dimensional data
title_full	Hybrid deep learning approach to improve classification of low-volume high-dimensional data
title_fullStr	Hybrid deep learning approach to improve classification of low-volume high-dimensional data
title_full_unstemmed	Hybrid deep learning approach to improve classification of low-volume high-dimensional data
title_short	Hybrid deep learning approach to improve classification of low-volume high-dimensional data
title_sort	hybrid deep learning approach to improve classification of low-volume high-dimensional data
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10631218/ https://www.ncbi.nlm.nih.gov/pubmed/37936066 http://dx.doi.org/10.1186/s12859-023-05557-w
work_keys_str_mv	AT mavaiepegah hybriddeeplearningapproachtoimproveclassificationoflowvolumehighdimensionaldata AT holderlawrence hybriddeeplearningapproachtoimproveclassificationoflowvolumehighdimensionaldata AT skinnermichaelk hybriddeeplearningapproachtoimproveclassificationoflowvolumehighdimensionaldata

Hybrid deep learning approach to improve classification of low-volume high-dimensional data

Ejemplares similares