Cargando…
Cluster analysis and its application to healthcare claims data: a study of end-stage renal disease patients who initiated hemodialysis
BACKGROUND: Cluster analysis (CA) is a frequently used applied statistical technique that helps to reveal hidden structures and “clusters” found in large data sets. However, this method has not been widely used in large healthcare claims databases where the distribution of expenditure data is common...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4776444/ https://www.ncbi.nlm.nih.gov/pubmed/26936756 http://dx.doi.org/10.1186/s12882-016-0238-2 |
_version_ | 1782419158243213312 |
---|---|
author | Liao, Minlei Li, Yunfeng Kianifard, Farid Obi, Engels Arcona, Stephen |
author_facet | Liao, Minlei Li, Yunfeng Kianifard, Farid Obi, Engels Arcona, Stephen |
author_sort | Liao, Minlei |
collection | PubMed |
description | BACKGROUND: Cluster analysis (CA) is a frequently used applied statistical technique that helps to reveal hidden structures and “clusters” found in large data sets. However, this method has not been widely used in large healthcare claims databases where the distribution of expenditure data is commonly severely skewed. The purpose of this study was to identify cost change patterns of patients with end-stage renal disease (ESRD) who initiated hemodialysis (HD) by applying different clustering methods. METHODS: A retrospective, cross-sectional, observational study was conducted using the Truven Health MarketScan® Research Databases. Patients aged ≥18 years with ≥2 ESRD diagnoses who initiated HD between 2008 and 2010 were included. The K-means CA method and hierarchical CA with various linkage methods were applied to all-cause costs within baseline (12-months pre-HD) and follow-up periods (12-months post-HD) to identify clusters. Demographic, clinical, and cost information was extracted from both periods, and then examined by cluster. RESULTS: A total of 18,380 patients were identified. Meaningful all-cause cost clusters were generated using K-means CA and hierarchical CA with either flexible beta or Ward’s methods. Based on cluster sample sizes and change of cost patterns, the K-means CA method and 4 clusters were selected: Cluster 1: Average to High (n = 113); Cluster 2: Very High to High (n = 89); Cluster 3: Average to Average (n = 16,624); or Cluster 4: Increasing Costs, High at Both Points (n = 1554). Median cost changes in the 12-month pre-HD and post-HD periods increased from $185,070 to $884,605 for Cluster 1 (Average to High), decreased from $910,930 to $157,997 for Cluster 2 (Very High to High), were relatively stable and remained low from $15,168 to $13,026 for Cluster 3 (Average to Average), and increased from $57,909 to $193,140 for Cluster 4 (Increasing Costs, High at Both Points). Relatively stable costs after starting HD were associated with more stable scores on comorbidity index scores from the pre-and post-HD periods, while increasing costs were associated with more sharply increasing comorbidity scores. CONCLUSIONS: The K-means CA method appeared to be the most appropriate in healthcare claims data with highly skewed cost information when taking into account both change of cost patterns and sample size in the smallest cluster. |
format | Online Article Text |
id | pubmed-4776444 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-47764442016-03-04 Cluster analysis and its application to healthcare claims data: a study of end-stage renal disease patients who initiated hemodialysis Liao, Minlei Li, Yunfeng Kianifard, Farid Obi, Engels Arcona, Stephen BMC Nephrol Research Article BACKGROUND: Cluster analysis (CA) is a frequently used applied statistical technique that helps to reveal hidden structures and “clusters” found in large data sets. However, this method has not been widely used in large healthcare claims databases where the distribution of expenditure data is commonly severely skewed. The purpose of this study was to identify cost change patterns of patients with end-stage renal disease (ESRD) who initiated hemodialysis (HD) by applying different clustering methods. METHODS: A retrospective, cross-sectional, observational study was conducted using the Truven Health MarketScan® Research Databases. Patients aged ≥18 years with ≥2 ESRD diagnoses who initiated HD between 2008 and 2010 were included. The K-means CA method and hierarchical CA with various linkage methods were applied to all-cause costs within baseline (12-months pre-HD) and follow-up periods (12-months post-HD) to identify clusters. Demographic, clinical, and cost information was extracted from both periods, and then examined by cluster. RESULTS: A total of 18,380 patients were identified. Meaningful all-cause cost clusters were generated using K-means CA and hierarchical CA with either flexible beta or Ward’s methods. Based on cluster sample sizes and change of cost patterns, the K-means CA method and 4 clusters were selected: Cluster 1: Average to High (n = 113); Cluster 2: Very High to High (n = 89); Cluster 3: Average to Average (n = 16,624); or Cluster 4: Increasing Costs, High at Both Points (n = 1554). Median cost changes in the 12-month pre-HD and post-HD periods increased from $185,070 to $884,605 for Cluster 1 (Average to High), decreased from $910,930 to $157,997 for Cluster 2 (Very High to High), were relatively stable and remained low from $15,168 to $13,026 for Cluster 3 (Average to Average), and increased from $57,909 to $193,140 for Cluster 4 (Increasing Costs, High at Both Points). Relatively stable costs after starting HD were associated with more stable scores on comorbidity index scores from the pre-and post-HD periods, while increasing costs were associated with more sharply increasing comorbidity scores. CONCLUSIONS: The K-means CA method appeared to be the most appropriate in healthcare claims data with highly skewed cost information when taking into account both change of cost patterns and sample size in the smallest cluster. BioMed Central 2016-03-02 /pmc/articles/PMC4776444/ /pubmed/26936756 http://dx.doi.org/10.1186/s12882-016-0238-2 Text en © Liao et al. 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Liao, Minlei Li, Yunfeng Kianifard, Farid Obi, Engels Arcona, Stephen Cluster analysis and its application to healthcare claims data: a study of end-stage renal disease patients who initiated hemodialysis |
title | Cluster analysis and its application to healthcare claims data: a study of end-stage renal disease patients who initiated hemodialysis |
title_full | Cluster analysis and its application to healthcare claims data: a study of end-stage renal disease patients who initiated hemodialysis |
title_fullStr | Cluster analysis and its application to healthcare claims data: a study of end-stage renal disease patients who initiated hemodialysis |
title_full_unstemmed | Cluster analysis and its application to healthcare claims data: a study of end-stage renal disease patients who initiated hemodialysis |
title_short | Cluster analysis and its application to healthcare claims data: a study of end-stage renal disease patients who initiated hemodialysis |
title_sort | cluster analysis and its application to healthcare claims data: a study of end-stage renal disease patients who initiated hemodialysis |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4776444/ https://www.ncbi.nlm.nih.gov/pubmed/26936756 http://dx.doi.org/10.1186/s12882-016-0238-2 |
work_keys_str_mv | AT liaominlei clusteranalysisanditsapplicationtohealthcareclaimsdataastudyofendstagerenaldiseasepatientswhoinitiatedhemodialysis AT liyunfeng clusteranalysisanditsapplicationtohealthcareclaimsdataastudyofendstagerenaldiseasepatientswhoinitiatedhemodialysis AT kianifardfarid clusteranalysisanditsapplicationtohealthcareclaimsdataastudyofendstagerenaldiseasepatientswhoinitiatedhemodialysis AT obiengels clusteranalysisanditsapplicationtohealthcareclaimsdataastudyofendstagerenaldiseasepatientswhoinitiatedhemodialysis AT arconastephen clusteranalysisanditsapplicationtohealthcareclaimsdataastudyofendstagerenaldiseasepatientswhoinitiatedhemodialysis |