Cargando…

Optimally adjusted last cluster for prediction based on balancing the bias and variance by bootstrapping

Estimating a predictive model from a dataset is best initiated with an unbiased estimator. However, since the unbiased estimator is unknown in general, the problem of the bias-variance tradeoff is raised. Aside from searching for an unbiased estimator, the convenient approach to the problem of the b...

Descripción completa

Detalles Bibliográficos
Autor principal: Kim, Jeongwoo
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6827892/
https://www.ncbi.nlm.nih.gov/pubmed/31682618
http://dx.doi.org/10.1371/journal.pone.0223529
_version_ 1783465358726791168
author Kim, Jeongwoo
author_facet Kim, Jeongwoo
author_sort Kim, Jeongwoo
collection PubMed
description Estimating a predictive model from a dataset is best initiated with an unbiased estimator. However, since the unbiased estimator is unknown in general, the problem of the bias-variance tradeoff is raised. Aside from searching for an unbiased estimator, the convenient approach to the problem of the bias-variance tradeoff may be to use the clustering method. Within a cluster whose size is smaller than the whole sample, we would expect the simple form of the estimator for prediction to avoid the overfitting problem. In this paper, we propose a new method to find the optimal cluster for prediction. Based on the previous literature, this cluster is considered to exist somewhere between the whole dataset and the typical cluster determined by partitioning data. To obtain a reliable cluster size, we use the bootstrap method in this paper. Additionally, through experiments with simulated and real-world data, we show that the prediction error can be reduced by applying this new method. We believe that our proposed method will be useful in many applications using a clustering algorithm for a stable prediction performance.
format Online
Article
Text
id pubmed-6827892
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-68278922019-11-12 Optimally adjusted last cluster for prediction based on balancing the bias and variance by bootstrapping Kim, Jeongwoo PLoS One Research Article Estimating a predictive model from a dataset is best initiated with an unbiased estimator. However, since the unbiased estimator is unknown in general, the problem of the bias-variance tradeoff is raised. Aside from searching for an unbiased estimator, the convenient approach to the problem of the bias-variance tradeoff may be to use the clustering method. Within a cluster whose size is smaller than the whole sample, we would expect the simple form of the estimator for prediction to avoid the overfitting problem. In this paper, we propose a new method to find the optimal cluster for prediction. Based on the previous literature, this cluster is considered to exist somewhere between the whole dataset and the typical cluster determined by partitioning data. To obtain a reliable cluster size, we use the bootstrap method in this paper. Additionally, through experiments with simulated and real-world data, we show that the prediction error can be reduced by applying this new method. We believe that our proposed method will be useful in many applications using a clustering algorithm for a stable prediction performance. Public Library of Science 2019-11-04 /pmc/articles/PMC6827892/ /pubmed/31682618 http://dx.doi.org/10.1371/journal.pone.0223529 Text en © 2019 Jeongwoo Kim http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Kim, Jeongwoo
Optimally adjusted last cluster for prediction based on balancing the bias and variance by bootstrapping
title Optimally adjusted last cluster for prediction based on balancing the bias and variance by bootstrapping
title_full Optimally adjusted last cluster for prediction based on balancing the bias and variance by bootstrapping
title_fullStr Optimally adjusted last cluster for prediction based on balancing the bias and variance by bootstrapping
title_full_unstemmed Optimally adjusted last cluster for prediction based on balancing the bias and variance by bootstrapping
title_short Optimally adjusted last cluster for prediction based on balancing the bias and variance by bootstrapping
title_sort optimally adjusted last cluster for prediction based on balancing the bias and variance by bootstrapping
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6827892/
https://www.ncbi.nlm.nih.gov/pubmed/31682618
http://dx.doi.org/10.1371/journal.pone.0223529
work_keys_str_mv AT kimjeongwoo optimallyadjustedlastclusterforpredictionbasedonbalancingthebiasandvariancebybootstrapping