Cargando…

Stratified Sampling Design Based on Data Mining

OBJECTIVES: To explore classification rules based on data mining methodologies which are to be used in defining strata in stratified sampling of healthcare providers with improved sampling efficiency. METHODS: We performed k-means clustering to group providers with similar characteristics, then, con...

Descripción completa

Detalles Bibliográficos
Autores principales: Kim, Yeonkook J., Oh, Yoonhwan, Park, Sunghoon, Cho, Sungzoon, Park, Hayoung
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Korean Society of Medical Informatics 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3810526/
https://www.ncbi.nlm.nih.gov/pubmed/24175117
http://dx.doi.org/10.4258/hir.2013.19.3.186
_version_ 1782288803737632768
author Kim, Yeonkook J.
Oh, Yoonhwan
Park, Sunghoon
Cho, Sungzoon
Park, Hayoung
author_facet Kim, Yeonkook J.
Oh, Yoonhwan
Park, Sunghoon
Cho, Sungzoon
Park, Hayoung
author_sort Kim, Yeonkook J.
collection PubMed
description OBJECTIVES: To explore classification rules based on data mining methodologies which are to be used in defining strata in stratified sampling of healthcare providers with improved sampling efficiency. METHODS: We performed k-means clustering to group providers with similar characteristics, then, constructed decision trees on cluster labels to generate stratification rules. We assessed the variance explained by the stratification proposed in this study and by conventional stratification to evaluate the performance of the sampling design. We constructed a study database from health insurance claims data and providers' profile data made available to this study by the Health Insurance Review and Assessment Service of South Korea, and population data from Statistics Korea. From our database, we used the data for single specialty clinics or hospitals in two specialties, general surgery and ophthalmology, for the year 2011 in this study. RESULTS: Data mining resulted in five strata in general surgery with two stratification variables, the number of inpatients per specialist and population density of provider location, and five strata in ophthalmology with two stratification variables, the number of inpatients per specialist and number of beds. The percentages of variance in annual changes in the productivity of specialists explained by the stratification in general surgery and ophthalmology were 22% and 8%, respectively, whereas conventional stratification by the type of provider location and number of beds explained 2% and 0.2% of variance, respectively. CONCLUSIONS: This study demonstrated that data mining methods can be used in designing efficient stratified sampling with variables readily available to the insurer and government; it offers an alternative to the existing stratification method that is widely used in healthcare provider surveys in South Korea.
format Online
Article
Text
id pubmed-3810526
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Korean Society of Medical Informatics
record_format MEDLINE/PubMed
spelling pubmed-38105262013-10-30 Stratified Sampling Design Based on Data Mining Kim, Yeonkook J. Oh, Yoonhwan Park, Sunghoon Cho, Sungzoon Park, Hayoung Healthc Inform Res Original Article OBJECTIVES: To explore classification rules based on data mining methodologies which are to be used in defining strata in stratified sampling of healthcare providers with improved sampling efficiency. METHODS: We performed k-means clustering to group providers with similar characteristics, then, constructed decision trees on cluster labels to generate stratification rules. We assessed the variance explained by the stratification proposed in this study and by conventional stratification to evaluate the performance of the sampling design. We constructed a study database from health insurance claims data and providers' profile data made available to this study by the Health Insurance Review and Assessment Service of South Korea, and population data from Statistics Korea. From our database, we used the data for single specialty clinics or hospitals in two specialties, general surgery and ophthalmology, for the year 2011 in this study. RESULTS: Data mining resulted in five strata in general surgery with two stratification variables, the number of inpatients per specialist and population density of provider location, and five strata in ophthalmology with two stratification variables, the number of inpatients per specialist and number of beds. The percentages of variance in annual changes in the productivity of specialists explained by the stratification in general surgery and ophthalmology were 22% and 8%, respectively, whereas conventional stratification by the type of provider location and number of beds explained 2% and 0.2% of variance, respectively. CONCLUSIONS: This study demonstrated that data mining methods can be used in designing efficient stratified sampling with variables readily available to the insurer and government; it offers an alternative to the existing stratification method that is widely used in healthcare provider surveys in South Korea. Korean Society of Medical Informatics 2013-09 2013-09-30 /pmc/articles/PMC3810526/ /pubmed/24175117 http://dx.doi.org/10.4258/hir.2013.19.3.186 Text en © 2013 The Korean Society of Medical Informatics http://creativecommons.org/licenses/by-nc/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Article
Kim, Yeonkook J.
Oh, Yoonhwan
Park, Sunghoon
Cho, Sungzoon
Park, Hayoung
Stratified Sampling Design Based on Data Mining
title Stratified Sampling Design Based on Data Mining
title_full Stratified Sampling Design Based on Data Mining
title_fullStr Stratified Sampling Design Based on Data Mining
title_full_unstemmed Stratified Sampling Design Based on Data Mining
title_short Stratified Sampling Design Based on Data Mining
title_sort stratified sampling design based on data mining
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3810526/
https://www.ncbi.nlm.nih.gov/pubmed/24175117
http://dx.doi.org/10.4258/hir.2013.19.3.186
work_keys_str_mv AT kimyeonkookj stratifiedsamplingdesignbasedondatamining
AT ohyoonhwan stratifiedsamplingdesignbasedondatamining
AT parksunghoon stratifiedsamplingdesignbasedondatamining
AT chosungzoon stratifiedsamplingdesignbasedondatamining
AT parkhayoung stratifiedsamplingdesignbasedondatamining