Cargando…
Hybrid deep learning approach to improve classification of low-volume high-dimensional data
BACKGROUND: The performance of machine learning classification methods relies heavily on the choice of features. In many domains, feature generation can be labor-intensive and require domain knowledge, and feature selection methods do not scale well in high-dimensional datasets. Deep learning has sh...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10631218/ https://www.ncbi.nlm.nih.gov/pubmed/37936066 http://dx.doi.org/10.1186/s12859-023-05557-w |
_version_ | 1785132326704906240 |
---|---|
author | Mavaie, Pegah Holder, Lawrence Skinner, Michael K. |
author_facet | Mavaie, Pegah Holder, Lawrence Skinner, Michael K. |
author_sort | Mavaie, Pegah |
collection | PubMed |
description | BACKGROUND: The performance of machine learning classification methods relies heavily on the choice of features. In many domains, feature generation can be labor-intensive and require domain knowledge, and feature selection methods do not scale well in high-dimensional datasets. Deep learning has shown success in feature generation but requires large datasets to achieve high classification accuracy. Biology domains typically exhibit these challenges with numerous handcrafted features (high-dimensional) and small amounts of training data (low volume). METHOD: A hybrid learning approach is proposed that first trains a deep network on the training data, extracts features from the deep network, and then uses these features to re-express the data for input to a non-deep learning method, which is trained to perform the final classification. RESULTS: The approach is systematically evaluated to determine the best layer of the deep learning network from which to extract features and the threshold on training data volume that prefers this approach. Results from several domains show that this hybrid approach outperforms standalone deep and non-deep learning methods, especially on low-volume, high-dimensional datasets. The diverse collection of datasets further supports the robustness of the approach across different domains. CONCLUSIONS: The hybrid approach combines the strengths of deep and non-deep learning paradigms to achieve high performance on high-dimensional, low volume learning tasks that are typical in biology domains. |
format | Online Article Text |
id | pubmed-10631218 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-106312182023-11-07 Hybrid deep learning approach to improve classification of low-volume high-dimensional data Mavaie, Pegah Holder, Lawrence Skinner, Michael K. BMC Bioinformatics Research BACKGROUND: The performance of machine learning classification methods relies heavily on the choice of features. In many domains, feature generation can be labor-intensive and require domain knowledge, and feature selection methods do not scale well in high-dimensional datasets. Deep learning has shown success in feature generation but requires large datasets to achieve high classification accuracy. Biology domains typically exhibit these challenges with numerous handcrafted features (high-dimensional) and small amounts of training data (low volume). METHOD: A hybrid learning approach is proposed that first trains a deep network on the training data, extracts features from the deep network, and then uses these features to re-express the data for input to a non-deep learning method, which is trained to perform the final classification. RESULTS: The approach is systematically evaluated to determine the best layer of the deep learning network from which to extract features and the threshold on training data volume that prefers this approach. Results from several domains show that this hybrid approach outperforms standalone deep and non-deep learning methods, especially on low-volume, high-dimensional datasets. The diverse collection of datasets further supports the robustness of the approach across different domains. CONCLUSIONS: The hybrid approach combines the strengths of deep and non-deep learning paradigms to achieve high performance on high-dimensional, low volume learning tasks that are typical in biology domains. BioMed Central 2023-11-07 /pmc/articles/PMC10631218/ /pubmed/37936066 http://dx.doi.org/10.1186/s12859-023-05557-w Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Mavaie, Pegah Holder, Lawrence Skinner, Michael K. Hybrid deep learning approach to improve classification of low-volume high-dimensional data |
title | Hybrid deep learning approach to improve classification of low-volume high-dimensional data |
title_full | Hybrid deep learning approach to improve classification of low-volume high-dimensional data |
title_fullStr | Hybrid deep learning approach to improve classification of low-volume high-dimensional data |
title_full_unstemmed | Hybrid deep learning approach to improve classification of low-volume high-dimensional data |
title_short | Hybrid deep learning approach to improve classification of low-volume high-dimensional data |
title_sort | hybrid deep learning approach to improve classification of low-volume high-dimensional data |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10631218/ https://www.ncbi.nlm.nih.gov/pubmed/37936066 http://dx.doi.org/10.1186/s12859-023-05557-w |
work_keys_str_mv | AT mavaiepegah hybriddeeplearningapproachtoimproveclassificationoflowvolumehighdimensionaldata AT holderlawrence hybriddeeplearningapproachtoimproveclassificationoflowvolumehighdimensionaldata AT skinnermichaelk hybriddeeplearningapproachtoimproveclassificationoflowvolumehighdimensionaldata |