Cargando…

A Robust Supervised Variable Selection for Noisy High-Dimensional Data

The Minimum Redundancy Maximum Relevance (MRMR) approach to supervised variable selection represents a successful methodology for dimensionality reduction, which is suitable for high-dimensional data observed in two or more different groups. Various available versions of the MRMR approach have been...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kalina, Jan, Schlenker, Anna
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Hindawi Publishing Corporation 2015
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4468284/ https://www.ncbi.nlm.nih.gov/pubmed/26137474 http://dx.doi.org/10.1155/2015/320385

_version_	1782376482645999616
author	Kalina, Jan Schlenker, Anna
author_facet	Kalina, Jan Schlenker, Anna
author_sort	Kalina, Jan
collection	PubMed
description	The Minimum Redundancy Maximum Relevance (MRMR) approach to supervised variable selection represents a successful methodology for dimensionality reduction, which is suitable for high-dimensional data observed in two or more different groups. Various available versions of the MRMR approach have been designed to search for variables with the largest relevance for a classification task while controlling for redundancy of the selected set of variables. However, usual relevance and redundancy criteria have the disadvantages of being too sensitive to the presence of outlying measurements and/or being inefficient. We propose a novel approach called Minimum Regularized Redundancy Maximum Robust Relevance (MRRMRR), suitable for noisy high-dimensional data observed in two groups. It combines principles of regularization and robust statistics. Particularly, redundancy is measured by a new regularized version of the coefficient of multiple correlation and relevance is measured by a highly robust correlation coefficient based on the least weighted squares regression with data-adaptive weights. We compare various dimensionality reduction methods on three real data sets. To investigate the influence of noise or outliers on the data, we perform the computations also for data artificially contaminated by severe noise of various forms. The experimental results confirm the robustness of the method with respect to outliers.
format	Online Article Text
id	pubmed-4468284
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	Hindawi Publishing Corporation
record_format	MEDLINE/PubMed
spelling	pubmed-44682842015-07-01 A Robust Supervised Variable Selection for Noisy High-Dimensional Data Kalina, Jan Schlenker, Anna Biomed Res Int Research Article The Minimum Redundancy Maximum Relevance (MRMR) approach to supervised variable selection represents a successful methodology for dimensionality reduction, which is suitable for high-dimensional data observed in two or more different groups. Various available versions of the MRMR approach have been designed to search for variables with the largest relevance for a classification task while controlling for redundancy of the selected set of variables. However, usual relevance and redundancy criteria have the disadvantages of being too sensitive to the presence of outlying measurements and/or being inefficient. We propose a novel approach called Minimum Regularized Redundancy Maximum Robust Relevance (MRRMRR), suitable for noisy high-dimensional data observed in two groups. It combines principles of regularization and robust statistics. Particularly, redundancy is measured by a new regularized version of the coefficient of multiple correlation and relevance is measured by a highly robust correlation coefficient based on the least weighted squares regression with data-adaptive weights. We compare various dimensionality reduction methods on three real data sets. To investigate the influence of noise or outliers on the data, we perform the computations also for data artificially contaminated by severe noise of various forms. The experimental results confirm the robustness of the method with respect to outliers. Hindawi Publishing Corporation 2015 2015-06-02 /pmc/articles/PMC4468284/ /pubmed/26137474 http://dx.doi.org/10.1155/2015/320385 Text en Copyright © 2015 J. Kalina and A. Schlenker. https://creativecommons.org/licenses/by/3.0/ This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Kalina, Jan Schlenker, Anna A Robust Supervised Variable Selection for Noisy High-Dimensional Data
title	A Robust Supervised Variable Selection for Noisy High-Dimensional Data
title_full	A Robust Supervised Variable Selection for Noisy High-Dimensional Data
title_fullStr	A Robust Supervised Variable Selection for Noisy High-Dimensional Data
title_full_unstemmed	A Robust Supervised Variable Selection for Noisy High-Dimensional Data
title_short	A Robust Supervised Variable Selection for Noisy High-Dimensional Data
title_sort	robust supervised variable selection for noisy high-dimensional data
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4468284/ https://www.ncbi.nlm.nih.gov/pubmed/26137474 http://dx.doi.org/10.1155/2015/320385
work_keys_str_mv	AT kalinajan arobustsupervisedvariableselectionfornoisyhighdimensionaldata AT schlenkeranna arobustsupervisedvariableselectionfornoisyhighdimensionaldata AT kalinajan robustsupervisedvariableselectionfornoisyhighdimensionaldata AT schlenkeranna robustsupervisedvariableselectionfornoisyhighdimensionaldata

A Robust Supervised Variable Selection for Noisy High-Dimensional Data

Ejemplares similares