Cargando…

Optimized Mahalanobis–Taguchi System for High-Dimensional Small Sample Data Classification

The Mahalanobis–Taguchi system (MTS) is a multivariate data diagnosis and prediction technology, which is widely used to optimize large sample data or unbalanced data, but it is rarely used for high-dimensional small sample data. In this paper, the optimized MTS for the classification of high-dimens...

Descripción completa

Detalles Bibliográficos
Autores principales:	Xiao, Xinping, Fu, Dian, Shi, Yu, Wen, Jianghui
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Hindawi 2020
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7199641/ https://www.ncbi.nlm.nih.gov/pubmed/32405295 http://dx.doi.org/10.1155/2020/4609423

_version_	1783529187561177088
author	Xiao, Xinping Fu, Dian Shi, Yu Wen, Jianghui
author_facet	Xiao, Xinping Fu, Dian Shi, Yu Wen, Jianghui
author_sort	Xiao, Xinping
collection	PubMed
description	The Mahalanobis–Taguchi system (MTS) is a multivariate data diagnosis and prediction technology, which is widely used to optimize large sample data or unbalanced data, but it is rarely used for high-dimensional small sample data. In this paper, the optimized MTS for the classification of high-dimensional small sample data is discussed from two aspects, namely, the inverse matrix instability of the covariance matrix and the instability of feature selection. Firstly, based on regularization and smoothing techniques, this paper proposes a modified Mahalanobis metric to calculate the Mahalanobis distance, which is aimed at reducing the influence of the inverse matrix instability under small sample conditions. Secondly, the minimum redundancy-maximum relevance (mRMR) algorithm is introduced into the MTS for the instability problem of feature selection. By using the mRMR algorithm and signal-to-noise ratio (SNR), a two-stage feature selection method is proposed: the mRMR algorithm is first used to remove noise and redundant variables; the orthogonal table and SNR are then used to screen the combination of variables that make great contribution to classification. Then, the feasibility and simplicity of the optimized MTS are shown in five datasets from the UCI database. The Mahalanobis distance based on regularization and smoothing techniques (RS-MD) is more robust than the traditional Mahalanobis distance. The two-stage feature selection method improves the effectiveness of feature selection for MTS. Finally, the optimized MTS is applied to email classification of the Spambase dataset. The results show that the optimized MTS outperforms the classical MTS and the other 3 machine learning algorithms.
format	Online Article Text
id	pubmed-7199641
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Hindawi
record_format	MEDLINE/PubMed
spelling	pubmed-71996412020-05-13 Optimized Mahalanobis–Taguchi System for High-Dimensional Small Sample Data Classification Xiao, Xinping Fu, Dian Shi, Yu Wen, Jianghui Comput Intell Neurosci Research Article The Mahalanobis–Taguchi system (MTS) is a multivariate data diagnosis and prediction technology, which is widely used to optimize large sample data or unbalanced data, but it is rarely used for high-dimensional small sample data. In this paper, the optimized MTS for the classification of high-dimensional small sample data is discussed from two aspects, namely, the inverse matrix instability of the covariance matrix and the instability of feature selection. Firstly, based on regularization and smoothing techniques, this paper proposes a modified Mahalanobis metric to calculate the Mahalanobis distance, which is aimed at reducing the influence of the inverse matrix instability under small sample conditions. Secondly, the minimum redundancy-maximum relevance (mRMR) algorithm is introduced into the MTS for the instability problem of feature selection. By using the mRMR algorithm and signal-to-noise ratio (SNR), a two-stage feature selection method is proposed: the mRMR algorithm is first used to remove noise and redundant variables; the orthogonal table and SNR are then used to screen the combination of variables that make great contribution to classification. Then, the feasibility and simplicity of the optimized MTS are shown in five datasets from the UCI database. The Mahalanobis distance based on regularization and smoothing techniques (RS-MD) is more robust than the traditional Mahalanobis distance. The two-stage feature selection method improves the effectiveness of feature selection for MTS. Finally, the optimized MTS is applied to email classification of the Spambase dataset. The results show that the optimized MTS outperforms the classical MTS and the other 3 machine learning algorithms. Hindawi 2020-04-26 /pmc/articles/PMC7199641/ /pubmed/32405295 http://dx.doi.org/10.1155/2020/4609423 Text en Copyright © 2020 Xinping Xiao et al. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Xiao, Xinping Fu, Dian Shi, Yu Wen, Jianghui Optimized Mahalanobis–Taguchi System for High-Dimensional Small Sample Data Classification
title	Optimized Mahalanobis–Taguchi System for High-Dimensional Small Sample Data Classification
title_full	Optimized Mahalanobis–Taguchi System for High-Dimensional Small Sample Data Classification
title_fullStr	Optimized Mahalanobis–Taguchi System for High-Dimensional Small Sample Data Classification
title_full_unstemmed	Optimized Mahalanobis–Taguchi System for High-Dimensional Small Sample Data Classification
title_short	Optimized Mahalanobis–Taguchi System for High-Dimensional Small Sample Data Classification
title_sort	optimized mahalanobis–taguchi system for high-dimensional small sample data classification
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7199641/ https://www.ncbi.nlm.nih.gov/pubmed/32405295 http://dx.doi.org/10.1155/2020/4609423
work_keys_str_mv	AT xiaoxinping optimizedmahalanobistaguchisystemforhighdimensionalsmallsampledataclassification AT fudian optimizedmahalanobistaguchisystemforhighdimensionalsmallsampledataclassification AT shiyu optimizedmahalanobistaguchisystemforhighdimensionalsmallsampledataclassification AT wenjianghui optimizedmahalanobistaguchisystemforhighdimensionalsmallsampledataclassification

Optimized Mahalanobis–Taguchi System for High-Dimensional Small Sample Data Classification

Ejemplares similares