Cargando…

Application of feature selection methods for automated clustering analysis: a review on synthetic datasets

The effective modelling of high-dimensional data with hundreds to thousands of features remains a challenging task in the field of machine learning. This process is a manually intensive task and requires skilled data scientists to apply exploratory data analysis techniques and statistical methods in...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ahmad, Aliyu Usman, Starkey, Andrew
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer London 2017
Materias:	Eann 2016
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5857284/ https://www.ncbi.nlm.nih.gov/pubmed/29576689 http://dx.doi.org/10.1007/s00521-017-3005-9

_version_	1783307441522343936
author	Ahmad, Aliyu Usman Starkey, Andrew
author_facet	Ahmad, Aliyu Usman Starkey, Andrew
author_sort	Ahmad, Aliyu Usman
collection	PubMed
description	The effective modelling of high-dimensional data with hundreds to thousands of features remains a challenging task in the field of machine learning. This process is a manually intensive task and requires skilled data scientists to apply exploratory data analysis techniques and statistical methods in pre-processing datasets for meaningful analysis with machine learning methods. However, the massive growth of data has brought about the need for fully automated data analysis methods. One of the key challenges is the accurate selection of a set of relevant features, which can be buried in high-dimensional data along with irrelevant noisy features, by choosing a subset of the complete set of input features that predicts the output with higher accuracy comparable to the performance of the complete input set. Kohonen’s self-organising neural network map has been utilised in various ways for this task, such as with the weighted self-organising map (WSOM) approach and this method is reviewed for its efficacy. The study demonstrates that the WSOM approach can result in different results on different runs on a given dataset due to the inappropriate use of the steepest descent optimisation method to minimise the weighted SOM’s cost function. An alternative feature weighting approach based on analysis of the SOM after training is presented; the proposed approach allows the SOM to converge before analysing the input relevance, unlike the WSOM that aims to apply weighting to the inputs during the training which distorts the SOM’s cost function, resulting in multiple local minimums meaning the SOM does not consistently converge to the same state. We demonstrate the superiority of the proposed method over the WSOM and a standard SOM in feature selection with improved clustering analysis.
format	Online Article Text
id	pubmed-5857284
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	Springer London
record_format	MEDLINE/PubMed
spelling	pubmed-58572842018-03-21 Application of feature selection methods for automated clustering analysis: a review on synthetic datasets Ahmad, Aliyu Usman Starkey, Andrew Neural Comput Appl Eann 2016 The effective modelling of high-dimensional data with hundreds to thousands of features remains a challenging task in the field of machine learning. This process is a manually intensive task and requires skilled data scientists to apply exploratory data analysis techniques and statistical methods in pre-processing datasets for meaningful analysis with machine learning methods. However, the massive growth of data has brought about the need for fully automated data analysis methods. One of the key challenges is the accurate selection of a set of relevant features, which can be buried in high-dimensional data along with irrelevant noisy features, by choosing a subset of the complete set of input features that predicts the output with higher accuracy comparable to the performance of the complete input set. Kohonen’s self-organising neural network map has been utilised in various ways for this task, such as with the weighted self-organising map (WSOM) approach and this method is reviewed for its efficacy. The study demonstrates that the WSOM approach can result in different results on different runs on a given dataset due to the inappropriate use of the steepest descent optimisation method to minimise the weighted SOM’s cost function. An alternative feature weighting approach based on analysis of the SOM after training is presented; the proposed approach allows the SOM to converge before analysing the input relevance, unlike the WSOM that aims to apply weighting to the inputs during the training which distorts the SOM’s cost function, resulting in multiple local minimums meaning the SOM does not consistently converge to the same state. We demonstrate the superiority of the proposed method over the WSOM and a standard SOM in feature selection with improved clustering analysis. Springer London 2017-04-22 2018 /pmc/articles/PMC5857284/ /pubmed/29576689 http://dx.doi.org/10.1007/s00521-017-3005-9 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
spellingShingle	Eann 2016 Ahmad, Aliyu Usman Starkey, Andrew Application of feature selection methods for automated clustering analysis: a review on synthetic datasets
title	Application of feature selection methods for automated clustering analysis: a review on synthetic datasets
title_full	Application of feature selection methods for automated clustering analysis: a review on synthetic datasets
title_fullStr	Application of feature selection methods for automated clustering analysis: a review on synthetic datasets
title_full_unstemmed	Application of feature selection methods for automated clustering analysis: a review on synthetic datasets
title_short	Application of feature selection methods for automated clustering analysis: a review on synthetic datasets
title_sort	application of feature selection methods for automated clustering analysis: a review on synthetic datasets
topic	Eann 2016
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5857284/ https://www.ncbi.nlm.nih.gov/pubmed/29576689 http://dx.doi.org/10.1007/s00521-017-3005-9
work_keys_str_mv	AT ahmadaliyuusman applicationoffeatureselectionmethodsforautomatedclusteringanalysisareviewonsyntheticdatasets AT starkeyandrew applicationoffeatureselectionmethodsforautomatedclusteringanalysisareviewonsyntheticdatasets

Application of feature selection methods for automated clustering analysis: a review on synthetic datasets

Ejemplares similares