Cargando…
Optimally adjusted last cluster for prediction based on balancing the bias and variance by bootstrapping
Estimating a predictive model from a dataset is best initiated with an unbiased estimator. However, since the unbiased estimator is unknown in general, the problem of the bias-variance tradeoff is raised. Aside from searching for an unbiased estimator, the convenient approach to the problem of the b...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6827892/ https://www.ncbi.nlm.nih.gov/pubmed/31682618 http://dx.doi.org/10.1371/journal.pone.0223529 |
_version_ | 1783465358726791168 |
---|---|
author | Kim, Jeongwoo |
author_facet | Kim, Jeongwoo |
author_sort | Kim, Jeongwoo |
collection | PubMed |
description | Estimating a predictive model from a dataset is best initiated with an unbiased estimator. However, since the unbiased estimator is unknown in general, the problem of the bias-variance tradeoff is raised. Aside from searching for an unbiased estimator, the convenient approach to the problem of the bias-variance tradeoff may be to use the clustering method. Within a cluster whose size is smaller than the whole sample, we would expect the simple form of the estimator for prediction to avoid the overfitting problem. In this paper, we propose a new method to find the optimal cluster for prediction. Based on the previous literature, this cluster is considered to exist somewhere between the whole dataset and the typical cluster determined by partitioning data. To obtain a reliable cluster size, we use the bootstrap method in this paper. Additionally, through experiments with simulated and real-world data, we show that the prediction error can be reduced by applying this new method. We believe that our proposed method will be useful in many applications using a clustering algorithm for a stable prediction performance. |
format | Online Article Text |
id | pubmed-6827892 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-68278922019-11-12 Optimally adjusted last cluster for prediction based on balancing the bias and variance by bootstrapping Kim, Jeongwoo PLoS One Research Article Estimating a predictive model from a dataset is best initiated with an unbiased estimator. However, since the unbiased estimator is unknown in general, the problem of the bias-variance tradeoff is raised. Aside from searching for an unbiased estimator, the convenient approach to the problem of the bias-variance tradeoff may be to use the clustering method. Within a cluster whose size is smaller than the whole sample, we would expect the simple form of the estimator for prediction to avoid the overfitting problem. In this paper, we propose a new method to find the optimal cluster for prediction. Based on the previous literature, this cluster is considered to exist somewhere between the whole dataset and the typical cluster determined by partitioning data. To obtain a reliable cluster size, we use the bootstrap method in this paper. Additionally, through experiments with simulated and real-world data, we show that the prediction error can be reduced by applying this new method. We believe that our proposed method will be useful in many applications using a clustering algorithm for a stable prediction performance. Public Library of Science 2019-11-04 /pmc/articles/PMC6827892/ /pubmed/31682618 http://dx.doi.org/10.1371/journal.pone.0223529 Text en © 2019 Jeongwoo Kim http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Kim, Jeongwoo Optimally adjusted last cluster for prediction based on balancing the bias and variance by bootstrapping |
title | Optimally adjusted last cluster for prediction based on balancing the bias and variance by bootstrapping |
title_full | Optimally adjusted last cluster for prediction based on balancing the bias and variance by bootstrapping |
title_fullStr | Optimally adjusted last cluster for prediction based on balancing the bias and variance by bootstrapping |
title_full_unstemmed | Optimally adjusted last cluster for prediction based on balancing the bias and variance by bootstrapping |
title_short | Optimally adjusted last cluster for prediction based on balancing the bias and variance by bootstrapping |
title_sort | optimally adjusted last cluster for prediction based on balancing the bias and variance by bootstrapping |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6827892/ https://www.ncbi.nlm.nih.gov/pubmed/31682618 http://dx.doi.org/10.1371/journal.pone.0223529 |
work_keys_str_mv | AT kimjeongwoo optimallyadjustedlastclusterforpredictionbasedonbalancingthebiasandvariancebybootstrapping |