Cargando…

A novel autoencoder approach to feature extraction with linear separability for high-dimensional data

Feature extraction often needs to rely on sufficient information of the input data, however, the distribution of the data upon a high-dimensional space is too sparse to provide sufficient information for feature extraction. Furthermore, high dimensionality of the data also creates trouble for the se...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zheng, Jian, Qu, Hongchun, Li, Zhaoni, Li, Lin, Tang, Xiaoming, Guo, Fei
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	PeerJ Inc. 2022
Materias:	Data Mining and Machine Learning
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10403198/ https://www.ncbi.nlm.nih.gov/pubmed/37547057 http://dx.doi.org/10.7717/peerj-cs.1061

_version_	1785085016822251520
author	Zheng, Jian Qu, Hongchun Li, Zhaoni Li, Lin Tang, Xiaoming Guo, Fei
author_facet	Zheng, Jian Qu, Hongchun Li, Zhaoni Li, Lin Tang, Xiaoming Guo, Fei
author_sort	Zheng, Jian
collection	PubMed
description	Feature extraction often needs to rely on sufficient information of the input data, however, the distribution of the data upon a high-dimensional space is too sparse to provide sufficient information for feature extraction. Furthermore, high dimensionality of the data also creates trouble for the searching of those features scattered in subspaces. As such, it is a tricky task for feature extraction from the data upon a high-dimensional space. To address this issue, this article proposes a novel autoencoder method using Mahalanobis distance metric of rescaling transformation. The key idea of the method is that by implementing Mahalanobis distance metric of rescaling transformation, the difference between the reconstructed distribution and the original distribution can be reduced, so as to improve the ability of feature extraction to the autoencoder. Results show that the proposed approach wins the state-of-the-art methods in terms of both the accuracy of feature extraction and the linear separabilities of the extracted features. We indicate that distance metric-based methods are more suitable for extracting those features with linear separabilities from high-dimensional data than feature selection-based methods. In a high-dimensional space, evaluating feature similarity is relatively easier than evaluating feature importance, so that distance metric methods by evaluating feature similarity gain advantages over feature selection methods by assessing feature importance for feature extraction, while evaluating feature importance is more computationally efficient than evaluating feature similarity.
format	Online Article Text
id	pubmed-10403198
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	PeerJ Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-104031982023-08-05 A novel autoencoder approach to feature extraction with linear separability for high-dimensional data Zheng, Jian Qu, Hongchun Li, Zhaoni Li, Lin Tang, Xiaoming Guo, Fei PeerJ Comput Sci Data Mining and Machine Learning Feature extraction often needs to rely on sufficient information of the input data, however, the distribution of the data upon a high-dimensional space is too sparse to provide sufficient information for feature extraction. Furthermore, high dimensionality of the data also creates trouble for the searching of those features scattered in subspaces. As such, it is a tricky task for feature extraction from the data upon a high-dimensional space. To address this issue, this article proposes a novel autoencoder method using Mahalanobis distance metric of rescaling transformation. The key idea of the method is that by implementing Mahalanobis distance metric of rescaling transformation, the difference between the reconstructed distribution and the original distribution can be reduced, so as to improve the ability of feature extraction to the autoencoder. Results show that the proposed approach wins the state-of-the-art methods in terms of both the accuracy of feature extraction and the linear separabilities of the extracted features. We indicate that distance metric-based methods are more suitable for extracting those features with linear separabilities from high-dimensional data than feature selection-based methods. In a high-dimensional space, evaluating feature similarity is relatively easier than evaluating feature importance, so that distance metric methods by evaluating feature similarity gain advantages over feature selection methods by assessing feature importance for feature extraction, while evaluating feature importance is more computationally efficient than evaluating feature similarity. PeerJ Inc. 2022-08-11 /pmc/articles/PMC10403198/ /pubmed/37547057 http://dx.doi.org/10.7717/peerj-cs.1061 Text en ©2022 Zheng et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle	Data Mining and Machine Learning Zheng, Jian Qu, Hongchun Li, Zhaoni Li, Lin Tang, Xiaoming Guo, Fei A novel autoencoder approach to feature extraction with linear separability for high-dimensional data
title	A novel autoencoder approach to feature extraction with linear separability for high-dimensional data
title_full	A novel autoencoder approach to feature extraction with linear separability for high-dimensional data
title_fullStr	A novel autoencoder approach to feature extraction with linear separability for high-dimensional data
title_full_unstemmed	A novel autoencoder approach to feature extraction with linear separability for high-dimensional data
title_short	A novel autoencoder approach to feature extraction with linear separability for high-dimensional data
title_sort	novel autoencoder approach to feature extraction with linear separability for high-dimensional data
topic	Data Mining and Machine Learning
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10403198/ https://www.ncbi.nlm.nih.gov/pubmed/37547057 http://dx.doi.org/10.7717/peerj-cs.1061
work_keys_str_mv	AT zhengjian anovelautoencoderapproachtofeatureextractionwithlinearseparabilityforhighdimensionaldata AT quhongchun anovelautoencoderapproachtofeatureextractionwithlinearseparabilityforhighdimensionaldata AT lizhaoni anovelautoencoderapproachtofeatureextractionwithlinearseparabilityforhighdimensionaldata AT lilin anovelautoencoderapproachtofeatureextractionwithlinearseparabilityforhighdimensionaldata AT tangxiaoming anovelautoencoderapproachtofeatureextractionwithlinearseparabilityforhighdimensionaldata AT guofei anovelautoencoderapproachtofeatureextractionwithlinearseparabilityforhighdimensionaldata AT zhengjian novelautoencoderapproachtofeatureextractionwithlinearseparabilityforhighdimensionaldata AT quhongchun novelautoencoderapproachtofeatureextractionwithlinearseparabilityforhighdimensionaldata AT lizhaoni novelautoencoderapproachtofeatureextractionwithlinearseparabilityforhighdimensionaldata AT lilin novelautoencoderapproachtofeatureextractionwithlinearseparabilityforhighdimensionaldata AT tangxiaoming novelautoencoderapproachtofeatureextractionwithlinearseparabilityforhighdimensionaldata AT guofei novelautoencoderapproachtofeatureextractionwithlinearseparabilityforhighdimensionaldata

A novel autoencoder approach to feature extraction with linear separability for high-dimensional data

Ejemplares similares