Cargando…

Adaptive kernel fuzzy clustering for missing data

Many machine learning procedures, including clustering analysis are often affected by missing values. This work aims to propose and evaluate a Kernel Fuzzy C-means clustering algorithm considering the kernelization of the metric with local adaptive distances (VKFCM-K-LP) under three types of strateg...

Descripción completa

Detalles Bibliográficos
Autores principales:	Rodrigues, Anny K. G., Ospina, Raydonal, Ferreira, Marcelo R. P.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2021
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8589222/ https://www.ncbi.nlm.nih.gov/pubmed/34767560 http://dx.doi.org/10.1371/journal.pone.0259266

_version_	1784598653725310976
author	Rodrigues, Anny K. G. Ospina, Raydonal Ferreira, Marcelo R. P.
author_facet	Rodrigues, Anny K. G. Ospina, Raydonal Ferreira, Marcelo R. P.
author_sort	Rodrigues, Anny K. G.
collection	PubMed
description	Many machine learning procedures, including clustering analysis are often affected by missing values. This work aims to propose and evaluate a Kernel Fuzzy C-means clustering algorithm considering the kernelization of the metric with local adaptive distances (VKFCM-K-LP) under three types of strategies to deal with missing data. The first strategy, called Whole Data Strategy (WDS), performs clustering only on the complete part of the dataset, i.e. it discards all instances with missing data. The second approach uses the Partial Distance Strategy (PDS), in which partial distances are computed among all available resources and then re-scaled by the reciprocal of the proportion of observed values. The third technique, called Optimal Completion Strategy (OCS), computes missing values iteratively as auxiliary variables in the optimization of a suitable objective function. The clustering results were evaluated according to different metrics. The best performance of the clustering algorithm was achieved under the PDS and OCS strategies. Under the OCS approach, new datasets were derive and the missing values were estimated dynamically in the optimization process. The results of clustering under the OCS strategy also presented a superior performance when compared to the resulting clusters obtained by applying the VKFCM-K-LP algorithm on a version where missing values are previously imputed by the mean or the median of the observed values.
format	Online Article Text
id	pubmed-8589222
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-85892222021-11-13 Adaptive kernel fuzzy clustering for missing data Rodrigues, Anny K. G. Ospina, Raydonal Ferreira, Marcelo R. P. PLoS One Research Article Many machine learning procedures, including clustering analysis are often affected by missing values. This work aims to propose and evaluate a Kernel Fuzzy C-means clustering algorithm considering the kernelization of the metric with local adaptive distances (VKFCM-K-LP) under three types of strategies to deal with missing data. The first strategy, called Whole Data Strategy (WDS), performs clustering only on the complete part of the dataset, i.e. it discards all instances with missing data. The second approach uses the Partial Distance Strategy (PDS), in which partial distances are computed among all available resources and then re-scaled by the reciprocal of the proportion of observed values. The third technique, called Optimal Completion Strategy (OCS), computes missing values iteratively as auxiliary variables in the optimization of a suitable objective function. The clustering results were evaluated according to different metrics. The best performance of the clustering algorithm was achieved under the PDS and OCS strategies. Under the OCS approach, new datasets were derive and the missing values were estimated dynamically in the optimization process. The results of clustering under the OCS strategy also presented a superior performance when compared to the resulting clusters obtained by applying the VKFCM-K-LP algorithm on a version where missing values are previously imputed by the mean or the median of the observed values. Public Library of Science 2021-11-12 /pmc/articles/PMC8589222/ /pubmed/34767560 http://dx.doi.org/10.1371/journal.pone.0259266 Text en © 2021 Rodrigues et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Rodrigues, Anny K. G. Ospina, Raydonal Ferreira, Marcelo R. P. Adaptive kernel fuzzy clustering for missing data
title	Adaptive kernel fuzzy clustering for missing data
title_full	Adaptive kernel fuzzy clustering for missing data
title_fullStr	Adaptive kernel fuzzy clustering for missing data
title_full_unstemmed	Adaptive kernel fuzzy clustering for missing data
title_short	Adaptive kernel fuzzy clustering for missing data
title_sort	adaptive kernel fuzzy clustering for missing data
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8589222/ https://www.ncbi.nlm.nih.gov/pubmed/34767560 http://dx.doi.org/10.1371/journal.pone.0259266
work_keys_str_mv	AT rodriguesannykg adaptivekernelfuzzyclusteringformissingdata AT ospinaraydonal adaptivekernelfuzzyclusteringformissingdata AT ferreiramarcelorp adaptivekernelfuzzyclusteringformissingdata

Adaptive kernel fuzzy clustering for missing data

Ejemplares similares