Cargando…

Feature selection with the R package MXM

Feature (or variable) selection is the process of identifying the minimal set of features with the highest predictive performance on the target variable of interest. Numerous feature selection algorithms have been developed over the years, but only few have been implemented in R and made publicly av...

Descripción completa

Detalles Bibliográficos
Autores principales:	Tsagris, Michail, Tsamardinos, Ioannis
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	F1000 Research Limited 2019
Materias:	Software Tool Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6792475/ https://www.ncbi.nlm.nih.gov/pubmed/31656581 http://dx.doi.org/10.12688/f1000research.16216.2

_version_	1783459164057501696
author	Tsagris, Michail Tsamardinos, Ioannis
author_facet	Tsagris, Michail Tsamardinos, Ioannis
author_sort	Tsagris, Michail
collection	PubMed
description	Feature (or variable) selection is the process of identifying the minimal set of features with the highest predictive performance on the target variable of interest. Numerous feature selection algorithms have been developed over the years, but only few have been implemented in R and made publicly available R as packages while offering few options. The R package MXM offers a variety of feature selection algorithms, and has unique features that make it advantageous over its competitors: a) it contains feature selection algorithms that can treat numerous types of target variables, including continuous, percentages, time to event (survival), binary, nominal, ordinal, clustered, counts, left censored, etc; b) it contains a variety of regression models that can be plugged into the feature selection algorithms (for example with time to event data the user can choose among Cox, Weibull, log logistic or exponential regression); c) it includes an algorithm for detecting multiple solutions (many sets of statistically equivalent features, plain speaking, two features can carry statistically equivalent information when substituting one with the other does not effect the inference or the conclusions); and d) it includes memory efficient algorithms for high volume data, data that cannot be loaded into R (In a 16GB RAM terminal for example, R cannot directly load data of 16GB size. By utilizing the proper package, we load the data and then perform feature selection.). In this paper, we qualitatively compare MXM with other relevant feature selection packages and discuss its advantages and disadvantages. Further, we provide a demonstration of MXM’s algorithms using real high-dimensional data from various applications.
format	Online Article Text
id	pubmed-6792475
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	F1000 Research Limited
record_format	MEDLINE/PubMed
spelling	pubmed-67924752019-10-25 Feature selection with the R package MXM Tsagris, Michail Tsamardinos, Ioannis F1000Res Software Tool Article Feature (or variable) selection is the process of identifying the minimal set of features with the highest predictive performance on the target variable of interest. Numerous feature selection algorithms have been developed over the years, but only few have been implemented in R and made publicly available R as packages while offering few options. The R package MXM offers a variety of feature selection algorithms, and has unique features that make it advantageous over its competitors: a) it contains feature selection algorithms that can treat numerous types of target variables, including continuous, percentages, time to event (survival), binary, nominal, ordinal, clustered, counts, left censored, etc; b) it contains a variety of regression models that can be plugged into the feature selection algorithms (for example with time to event data the user can choose among Cox, Weibull, log logistic or exponential regression); c) it includes an algorithm for detecting multiple solutions (many sets of statistically equivalent features, plain speaking, two features can carry statistically equivalent information when substituting one with the other does not effect the inference or the conclusions); and d) it includes memory efficient algorithms for high volume data, data that cannot be loaded into R (In a 16GB RAM terminal for example, R cannot directly load data of 16GB size. By utilizing the proper package, we load the data and then perform feature selection.). In this paper, we qualitatively compare MXM with other relevant feature selection packages and discuss its advantages and disadvantages. Further, we provide a demonstration of MXM’s algorithms using real high-dimensional data from various applications. F1000 Research Limited 2019-09-30 /pmc/articles/PMC6792475/ /pubmed/31656581 http://dx.doi.org/10.12688/f1000research.16216.2 Text en Copyright: © 2019 Tsagris M and Tsamardinos I http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Software Tool Article Tsagris, Michail Tsamardinos, Ioannis Feature selection with the R package MXM
title	Feature selection with the R package MXM
title_full	Feature selection with the R package MXM
title_fullStr	Feature selection with the R package MXM
title_full_unstemmed	Feature selection with the R package MXM
title_short	Feature selection with the R package MXM
title_sort	feature selection with the r package mxm
topic	Software Tool Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6792475/ https://www.ncbi.nlm.nih.gov/pubmed/31656581 http://dx.doi.org/10.12688/f1000research.16216.2
work_keys_str_mv	AT tsagrismichail featureselectionwiththerpackagemxm AT tsamardinosioannis featureselectionwiththerpackagemxm

Feature selection with the R package MXM

Ejemplares similares