Cargando…

ClearF++: Improved Supervised Feature Scoring Using Feature Clustering in Class-Wise Embedding and Reconstruction

Feature selection methods are essential for accurate disease classification and identifying informative biomarkers. While information-theoretic methods have been widely used, they often exhibit limitations such as high computational costs. Our previously proposed method, ClearF, addresses these issu...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Sehee, Kim, So Yeon, Sohn, Kyung-Ah
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10376817/
https://www.ncbi.nlm.nih.gov/pubmed/37508851
http://dx.doi.org/10.3390/bioengineering10070824
_version_ 1785079366636535808
author Wang, Sehee
Kim, So Yeon
Sohn, Kyung-Ah
author_facet Wang, Sehee
Kim, So Yeon
Sohn, Kyung-Ah
author_sort Wang, Sehee
collection PubMed
description Feature selection methods are essential for accurate disease classification and identifying informative biomarkers. While information-theoretic methods have been widely used, they often exhibit limitations such as high computational costs. Our previously proposed method, ClearF, addresses these issues by using reconstruction error from low-dimensional embeddings as a proxy for the entropy term in the mutual information. However, ClearF still has limitations, including a nontransparent bottleneck layer selection process, which can result in unstable feature selection. To address these limitations, we propose ClearF++, which simplifies the bottleneck layer selection and incorporates feature-wise clustering to enhance biomarker detection. We compare its performance with other commonly used methods such as MultiSURF and IFS, as well as ClearF, across multiple benchmark datasets. Our results demonstrate that ClearF++ consistently outperforms these methods in terms of prediction accuracy and stability, even with limited samples. We also observe that employing the Deep Embedded Clustering (DEC) algorithm for feature-wise clustering improves performance, indicating its suitability for handling complex data structures with limited samples. ClearF++ offers an improved biomarker prioritization approach with enhanced prediction performance and faster execution. Its stability and effectiveness with limited samples make it particularly valuable for biomedical data analysis.
format Online
Article
Text
id pubmed-10376817
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-103768172023-07-29 ClearF++: Improved Supervised Feature Scoring Using Feature Clustering in Class-Wise Embedding and Reconstruction Wang, Sehee Kim, So Yeon Sohn, Kyung-Ah Bioengineering (Basel) Article Feature selection methods are essential for accurate disease classification and identifying informative biomarkers. While information-theoretic methods have been widely used, they often exhibit limitations such as high computational costs. Our previously proposed method, ClearF, addresses these issues by using reconstruction error from low-dimensional embeddings as a proxy for the entropy term in the mutual information. However, ClearF still has limitations, including a nontransparent bottleneck layer selection process, which can result in unstable feature selection. To address these limitations, we propose ClearF++, which simplifies the bottleneck layer selection and incorporates feature-wise clustering to enhance biomarker detection. We compare its performance with other commonly used methods such as MultiSURF and IFS, as well as ClearF, across multiple benchmark datasets. Our results demonstrate that ClearF++ consistently outperforms these methods in terms of prediction accuracy and stability, even with limited samples. We also observe that employing the Deep Embedded Clustering (DEC) algorithm for feature-wise clustering improves performance, indicating its suitability for handling complex data structures with limited samples. ClearF++ offers an improved biomarker prioritization approach with enhanced prediction performance and faster execution. Its stability and effectiveness with limited samples make it particularly valuable for biomedical data analysis. MDPI 2023-07-10 /pmc/articles/PMC10376817/ /pubmed/37508851 http://dx.doi.org/10.3390/bioengineering10070824 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Wang, Sehee
Kim, So Yeon
Sohn, Kyung-Ah
ClearF++: Improved Supervised Feature Scoring Using Feature Clustering in Class-Wise Embedding and Reconstruction
title ClearF++: Improved Supervised Feature Scoring Using Feature Clustering in Class-Wise Embedding and Reconstruction
title_full ClearF++: Improved Supervised Feature Scoring Using Feature Clustering in Class-Wise Embedding and Reconstruction
title_fullStr ClearF++: Improved Supervised Feature Scoring Using Feature Clustering in Class-Wise Embedding and Reconstruction
title_full_unstemmed ClearF++: Improved Supervised Feature Scoring Using Feature Clustering in Class-Wise Embedding and Reconstruction
title_short ClearF++: Improved Supervised Feature Scoring Using Feature Clustering in Class-Wise Embedding and Reconstruction
title_sort clearf++: improved supervised feature scoring using feature clustering in class-wise embedding and reconstruction
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10376817/
https://www.ncbi.nlm.nih.gov/pubmed/37508851
http://dx.doi.org/10.3390/bioengineering10070824
work_keys_str_mv AT wangsehee clearfimprovedsupervisedfeaturescoringusingfeatureclusteringinclasswiseembeddingandreconstruction
AT kimsoyeon clearfimprovedsupervisedfeaturescoringusingfeatureclusteringinclasswiseembeddingandreconstruction
AT sohnkyungah clearfimprovedsupervisedfeaturescoringusingfeatureclusteringinclasswiseembeddingandreconstruction