Cargando…

Identifying subtypes of chronic kidney disease with machine learning: development, internal validation and prognostic validation using linked electronic health records in 350,067 individuals

BACKGROUND: Although chronic kidney disease (CKD) is associated with high multimorbidity, polypharmacy, morbidity and mortality, existing classification systems (mild to severe, usually based on estimated glomerular filtration rate, proteinuria or urine albumin-creatinine ratio) and risk prediction...

Descripción completa

Detalles Bibliográficos
Autores principales: Dashtban, Ashkan, Mizani, Mehrdad A., Pasea, Laura, Denaxas, Spiros, Corbett, Richard, Mamza, Jil B., Gao, He, Morris, Tamsin, Hemingway, Harry, Banerjee, Amitava
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9989643/
https://www.ncbi.nlm.nih.gov/pubmed/36857859
http://dx.doi.org/10.1016/j.ebiom.2023.104489
_version_ 1784901804223365120
author Dashtban, Ashkan
Mizani, Mehrdad A.
Pasea, Laura
Denaxas, Spiros
Corbett, Richard
Mamza, Jil B.
Gao, He
Morris, Tamsin
Hemingway, Harry
Banerjee, Amitava
author_facet Dashtban, Ashkan
Mizani, Mehrdad A.
Pasea, Laura
Denaxas, Spiros
Corbett, Richard
Mamza, Jil B.
Gao, He
Morris, Tamsin
Hemingway, Harry
Banerjee, Amitava
author_sort Dashtban, Ashkan
collection PubMed
description BACKGROUND: Although chronic kidney disease (CKD) is associated with high multimorbidity, polypharmacy, morbidity and mortality, existing classification systems (mild to severe, usually based on estimated glomerular filtration rate, proteinuria or urine albumin-creatinine ratio) and risk prediction models largely ignore the complexity of CKD, its risk factors and its outcomes. Improved subtype definition could improve prediction of outcomes and inform effective interventions. METHODS: We analysed individuals ≥18 years with incident and prevalent CKD (n = 350,067 and 195,422 respectively) from a population-based electronic health record resource (2006–2020; Clinical Practice Research Datalink, CPRD). We included factors (n = 264 with 2670 derived variables), e.g. demography, history, examination, blood laboratory values and medications. Using a published framework, we identified subtypes through seven unsupervised machine learning (ML) methods (K-means, Diana, HC, Fanny, PAM, Clara, Model-based) with 66 (of 2670) variables in each dataset. We evaluated subtypes for: (i) internal validity (within dataset, across methods); (ii) prognostic validity (predictive accuracy for 5-year all-cause mortality and admissions); and (iii) medications (new and existing by British National Formulary chapter). FINDINGS: After identifying five clusters across seven approaches, we labelled CKD subtypes: 1. Early-onset, 2. Late-onset, 3. Cancer, 4. Metabolic, and 5. Cardiometabolic. Internal validity: We trained a high performing model (using XGBoost) that could predict disease subtypes with 95% accuracy for incident and prevalent CKD (Sensitivity: 0.81–0.98, F1 score:0.84–0.97). Prognostic validity: 5-year all-cause mortality, hospital admissions, and incidence of new chronic diseases differed across CKD subtypes. The 5-year risk of mortality and admissions in the overall incident CKD population were highest in cardiometabolic subtype: 43.3% (42.3–42.8%) and 29.5% (29.1–30.0%), respectively, and lowest in the early-onset subtype: 5.7% (5.5–5.9%) and 18.7% (18.4–19.1%). Medications: Across CKD subtypes, the distribution of prescription medication classes at baseline varied, with highest medication burden in cardiometabolic and metabolic subtypes, and higher burden in prevalent than incident CKD. INTERPRETATION: In the largest CKD study using ML, to-date, we identified five distinct subtypes in individuals with incident and prevalent CKD. These subtypes have relevance to study of aetiology, therapeutics and risk prediction. FUNDING: 10.13039/100004325AstraZeneca UK Ltd, Health Data Research UK.
format Online
Article
Text
id pubmed-9989643
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-99896432023-03-08 Identifying subtypes of chronic kidney disease with machine learning: development, internal validation and prognostic validation using linked electronic health records in 350,067 individuals Dashtban, Ashkan Mizani, Mehrdad A. Pasea, Laura Denaxas, Spiros Corbett, Richard Mamza, Jil B. Gao, He Morris, Tamsin Hemingway, Harry Banerjee, Amitava eBioMedicine Articles BACKGROUND: Although chronic kidney disease (CKD) is associated with high multimorbidity, polypharmacy, morbidity and mortality, existing classification systems (mild to severe, usually based on estimated glomerular filtration rate, proteinuria or urine albumin-creatinine ratio) and risk prediction models largely ignore the complexity of CKD, its risk factors and its outcomes. Improved subtype definition could improve prediction of outcomes and inform effective interventions. METHODS: We analysed individuals ≥18 years with incident and prevalent CKD (n = 350,067 and 195,422 respectively) from a population-based electronic health record resource (2006–2020; Clinical Practice Research Datalink, CPRD). We included factors (n = 264 with 2670 derived variables), e.g. demography, history, examination, blood laboratory values and medications. Using a published framework, we identified subtypes through seven unsupervised machine learning (ML) methods (K-means, Diana, HC, Fanny, PAM, Clara, Model-based) with 66 (of 2670) variables in each dataset. We evaluated subtypes for: (i) internal validity (within dataset, across methods); (ii) prognostic validity (predictive accuracy for 5-year all-cause mortality and admissions); and (iii) medications (new and existing by British National Formulary chapter). FINDINGS: After identifying five clusters across seven approaches, we labelled CKD subtypes: 1. Early-onset, 2. Late-onset, 3. Cancer, 4. Metabolic, and 5. Cardiometabolic. Internal validity: We trained a high performing model (using XGBoost) that could predict disease subtypes with 95% accuracy for incident and prevalent CKD (Sensitivity: 0.81–0.98, F1 score:0.84–0.97). Prognostic validity: 5-year all-cause mortality, hospital admissions, and incidence of new chronic diseases differed across CKD subtypes. The 5-year risk of mortality and admissions in the overall incident CKD population were highest in cardiometabolic subtype: 43.3% (42.3–42.8%) and 29.5% (29.1–30.0%), respectively, and lowest in the early-onset subtype: 5.7% (5.5–5.9%) and 18.7% (18.4–19.1%). Medications: Across CKD subtypes, the distribution of prescription medication classes at baseline varied, with highest medication burden in cardiometabolic and metabolic subtypes, and higher burden in prevalent than incident CKD. INTERPRETATION: In the largest CKD study using ML, to-date, we identified five distinct subtypes in individuals with incident and prevalent CKD. These subtypes have relevance to study of aetiology, therapeutics and risk prediction. FUNDING: 10.13039/100004325AstraZeneca UK Ltd, Health Data Research UK. Elsevier 2023-02-27 /pmc/articles/PMC9989643/ /pubmed/36857859 http://dx.doi.org/10.1016/j.ebiom.2023.104489 Text en © 2023 The Authors https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Articles
Dashtban, Ashkan
Mizani, Mehrdad A.
Pasea, Laura
Denaxas, Spiros
Corbett, Richard
Mamza, Jil B.
Gao, He
Morris, Tamsin
Hemingway, Harry
Banerjee, Amitava
Identifying subtypes of chronic kidney disease with machine learning: development, internal validation and prognostic validation using linked electronic health records in 350,067 individuals
title Identifying subtypes of chronic kidney disease with machine learning: development, internal validation and prognostic validation using linked electronic health records in 350,067 individuals
title_full Identifying subtypes of chronic kidney disease with machine learning: development, internal validation and prognostic validation using linked electronic health records in 350,067 individuals
title_fullStr Identifying subtypes of chronic kidney disease with machine learning: development, internal validation and prognostic validation using linked electronic health records in 350,067 individuals
title_full_unstemmed Identifying subtypes of chronic kidney disease with machine learning: development, internal validation and prognostic validation using linked electronic health records in 350,067 individuals
title_short Identifying subtypes of chronic kidney disease with machine learning: development, internal validation and prognostic validation using linked electronic health records in 350,067 individuals
title_sort identifying subtypes of chronic kidney disease with machine learning: development, internal validation and prognostic validation using linked electronic health records in 350,067 individuals
topic Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9989643/
https://www.ncbi.nlm.nih.gov/pubmed/36857859
http://dx.doi.org/10.1016/j.ebiom.2023.104489
work_keys_str_mv AT dashtbanashkan identifyingsubtypesofchronickidneydiseasewithmachinelearningdevelopmentinternalvalidationandprognosticvalidationusinglinkedelectronichealthrecordsin350067individuals
AT mizanimehrdada identifyingsubtypesofchronickidneydiseasewithmachinelearningdevelopmentinternalvalidationandprognosticvalidationusinglinkedelectronichealthrecordsin350067individuals
AT pasealaura identifyingsubtypesofchronickidneydiseasewithmachinelearningdevelopmentinternalvalidationandprognosticvalidationusinglinkedelectronichealthrecordsin350067individuals
AT denaxasspiros identifyingsubtypesofchronickidneydiseasewithmachinelearningdevelopmentinternalvalidationandprognosticvalidationusinglinkedelectronichealthrecordsin350067individuals
AT corbettrichard identifyingsubtypesofchronickidneydiseasewithmachinelearningdevelopmentinternalvalidationandprognosticvalidationusinglinkedelectronichealthrecordsin350067individuals
AT mamzajilb identifyingsubtypesofchronickidneydiseasewithmachinelearningdevelopmentinternalvalidationandprognosticvalidationusinglinkedelectronichealthrecordsin350067individuals
AT gaohe identifyingsubtypesofchronickidneydiseasewithmachinelearningdevelopmentinternalvalidationandprognosticvalidationusinglinkedelectronichealthrecordsin350067individuals
AT morristamsin identifyingsubtypesofchronickidneydiseasewithmachinelearningdevelopmentinternalvalidationandprognosticvalidationusinglinkedelectronichealthrecordsin350067individuals
AT hemingwayharry identifyingsubtypesofchronickidneydiseasewithmachinelearningdevelopmentinternalvalidationandprognosticvalidationusinglinkedelectronichealthrecordsin350067individuals
AT banerjeeamitava identifyingsubtypesofchronickidneydiseasewithmachinelearningdevelopmentinternalvalidationandprognosticvalidationusinglinkedelectronichealthrecordsin350067individuals