Cargando…

Hybrid deep learning approach to improve classification of low-volume high-dimensional data

BACKGROUND: The performance of machine learning classification methods relies heavily on the choice of features. In many domains, feature generation can be labor-intensive and require domain knowledge, and feature selection methods do not scale well in high-dimensional datasets. Deep learning has sh...

Descripción completa

Detalles Bibliográficos
Autores principales: Mavaie, Pegah, Holder, Lawrence, Skinner, Michael K.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10631218/
https://www.ncbi.nlm.nih.gov/pubmed/37936066
http://dx.doi.org/10.1186/s12859-023-05557-w
_version_ 1785132326704906240
author Mavaie, Pegah
Holder, Lawrence
Skinner, Michael K.
author_facet Mavaie, Pegah
Holder, Lawrence
Skinner, Michael K.
author_sort Mavaie, Pegah
collection PubMed
description BACKGROUND: The performance of machine learning classification methods relies heavily on the choice of features. In many domains, feature generation can be labor-intensive and require domain knowledge, and feature selection methods do not scale well in high-dimensional datasets. Deep learning has shown success in feature generation but requires large datasets to achieve high classification accuracy. Biology domains typically exhibit these challenges with numerous handcrafted features (high-dimensional) and small amounts of training data (low volume). METHOD: A hybrid learning approach is proposed that first trains a deep network on the training data, extracts features from the deep network, and then uses these features to re-express the data for input to a non-deep learning method, which is trained to perform the final classification. RESULTS: The approach is systematically evaluated to determine the best layer of the deep learning network from which to extract features and the threshold on training data volume that prefers this approach. Results from several domains show that this hybrid approach outperforms standalone deep and non-deep learning methods, especially on low-volume, high-dimensional datasets. The diverse collection of datasets further supports the robustness of the approach across different domains. CONCLUSIONS: The hybrid approach combines the strengths of deep and non-deep learning paradigms to achieve high performance on high-dimensional, low volume learning tasks that are typical in biology domains.
format Online
Article
Text
id pubmed-10631218
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-106312182023-11-07 Hybrid deep learning approach to improve classification of low-volume high-dimensional data Mavaie, Pegah Holder, Lawrence Skinner, Michael K. BMC Bioinformatics Research BACKGROUND: The performance of machine learning classification methods relies heavily on the choice of features. In many domains, feature generation can be labor-intensive and require domain knowledge, and feature selection methods do not scale well in high-dimensional datasets. Deep learning has shown success in feature generation but requires large datasets to achieve high classification accuracy. Biology domains typically exhibit these challenges with numerous handcrafted features (high-dimensional) and small amounts of training data (low volume). METHOD: A hybrid learning approach is proposed that first trains a deep network on the training data, extracts features from the deep network, and then uses these features to re-express the data for input to a non-deep learning method, which is trained to perform the final classification. RESULTS: The approach is systematically evaluated to determine the best layer of the deep learning network from which to extract features and the threshold on training data volume that prefers this approach. Results from several domains show that this hybrid approach outperforms standalone deep and non-deep learning methods, especially on low-volume, high-dimensional datasets. The diverse collection of datasets further supports the robustness of the approach across different domains. CONCLUSIONS: The hybrid approach combines the strengths of deep and non-deep learning paradigms to achieve high performance on high-dimensional, low volume learning tasks that are typical in biology domains. BioMed Central 2023-11-07 /pmc/articles/PMC10631218/ /pubmed/37936066 http://dx.doi.org/10.1186/s12859-023-05557-w Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Mavaie, Pegah
Holder, Lawrence
Skinner, Michael K.
Hybrid deep learning approach to improve classification of low-volume high-dimensional data
title Hybrid deep learning approach to improve classification of low-volume high-dimensional data
title_full Hybrid deep learning approach to improve classification of low-volume high-dimensional data
title_fullStr Hybrid deep learning approach to improve classification of low-volume high-dimensional data
title_full_unstemmed Hybrid deep learning approach to improve classification of low-volume high-dimensional data
title_short Hybrid deep learning approach to improve classification of low-volume high-dimensional data
title_sort hybrid deep learning approach to improve classification of low-volume high-dimensional data
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10631218/
https://www.ncbi.nlm.nih.gov/pubmed/37936066
http://dx.doi.org/10.1186/s12859-023-05557-w
work_keys_str_mv AT mavaiepegah hybriddeeplearningapproachtoimproveclassificationoflowvolumehighdimensionaldata
AT holderlawrence hybriddeeplearningapproachtoimproveclassificationoflowvolumehighdimensionaldata
AT skinnermichaelk hybriddeeplearningapproachtoimproveclassificationoflowvolumehighdimensionaldata