Cargando…

Identifying and evaluating clinical subtypes of Alzheimer’s disease in care electronic health records using unsupervised machine learning

BACKGROUND: Alzheimer’s disease (AD) is a highly heterogeneous disease with diverse trajectories and outcomes observed in clinical populations. Understanding this heterogeneity can enable better treatment, prognosis and disease management. Studies to date have mainly used imaging or cognition data a...

Descripción completa

Detalles Bibliográficos
Autores principales: Alexander, Nonie, Alexander, Daniel C., Barkhof, Frederik, Denaxas, Spiros
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8653614/
https://www.ncbi.nlm.nih.gov/pubmed/34879829
http://dx.doi.org/10.1186/s12911-021-01693-6
_version_ 1784611700771651584
author Alexander, Nonie
Alexander, Daniel C.
Barkhof, Frederik
Denaxas, Spiros
author_facet Alexander, Nonie
Alexander, Daniel C.
Barkhof, Frederik
Denaxas, Spiros
author_sort Alexander, Nonie
collection PubMed
description BACKGROUND: Alzheimer’s disease (AD) is a highly heterogeneous disease with diverse trajectories and outcomes observed in clinical populations. Understanding this heterogeneity can enable better treatment, prognosis and disease management. Studies to date have mainly used imaging or cognition data and have been limited in terms of data breadth and sample size. Here we examine the clinical heterogeneity of Alzheimer's disease patients using electronic health records (EHR) to identify and characterise disease subgroups using multiple clustering methods, identifying clusters which are clinically actionable. METHODS: We identified AD patients in primary care EHR from the Clinical Practice Research Datalink (CPRD) using a previously validated rule-based phenotyping algorithm. We extracted and included a range of comorbidities, symptoms and demographic features as patient features. We evaluated four different clustering methods (k-means, kernel k-means, affinity propagation and latent class analysis) to cluster Alzheimer’s disease patients. We compared clusters on clinically relevant outcomes and evaluated each method using measures of cluster structure, stability, efficiency of outcome prediction and replicability in external data sets. RESULTS: We identified 7,913 AD patients, with a mean age of 82 and 66.2% female. We included 21 features in our analysis. We observed 5, 2, 5 and 6 clusters in k-means, kernel k-means, affinity propagation and latent class analysis respectively. K-means was found to produce the most consistent results based on four evaluative measures. We discovered a consistent cluster found in three of the four methods composed of predominantly female, younger disease onset (43% between ages 42–73) diagnosed with depression and anxiety, with a quicker rate of progression compared to the average across other clusters. CONCLUSION: Each clustering approach produced substantially different clusters and K-Means performed the best out of the four methods based on the four evaluative criteria. However, the consistent appearance of one particular cluster across three of the four methods potentially suggests the presence of a distinct disease subtype that merits further exploration. Our study underlines the variability of the results obtained from different clustering approaches and the importance of systematically evaluating different approaches for identifying disease subtypes in complex EHR. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12911-021-01693-6.
format Online
Article
Text
id pubmed-8653614
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-86536142021-12-08 Identifying and evaluating clinical subtypes of Alzheimer’s disease in care electronic health records using unsupervised machine learning Alexander, Nonie Alexander, Daniel C. Barkhof, Frederik Denaxas, Spiros BMC Med Inform Decis Mak Research BACKGROUND: Alzheimer’s disease (AD) is a highly heterogeneous disease with diverse trajectories and outcomes observed in clinical populations. Understanding this heterogeneity can enable better treatment, prognosis and disease management. Studies to date have mainly used imaging or cognition data and have been limited in terms of data breadth and sample size. Here we examine the clinical heterogeneity of Alzheimer's disease patients using electronic health records (EHR) to identify and characterise disease subgroups using multiple clustering methods, identifying clusters which are clinically actionable. METHODS: We identified AD patients in primary care EHR from the Clinical Practice Research Datalink (CPRD) using a previously validated rule-based phenotyping algorithm. We extracted and included a range of comorbidities, symptoms and demographic features as patient features. We evaluated four different clustering methods (k-means, kernel k-means, affinity propagation and latent class analysis) to cluster Alzheimer’s disease patients. We compared clusters on clinically relevant outcomes and evaluated each method using measures of cluster structure, stability, efficiency of outcome prediction and replicability in external data sets. RESULTS: We identified 7,913 AD patients, with a mean age of 82 and 66.2% female. We included 21 features in our analysis. We observed 5, 2, 5 and 6 clusters in k-means, kernel k-means, affinity propagation and latent class analysis respectively. K-means was found to produce the most consistent results based on four evaluative measures. We discovered a consistent cluster found in three of the four methods composed of predominantly female, younger disease onset (43% between ages 42–73) diagnosed with depression and anxiety, with a quicker rate of progression compared to the average across other clusters. CONCLUSION: Each clustering approach produced substantially different clusters and K-Means performed the best out of the four methods based on the four evaluative criteria. However, the consistent appearance of one particular cluster across three of the four methods potentially suggests the presence of a distinct disease subtype that merits further exploration. Our study underlines the variability of the results obtained from different clustering approaches and the importance of systematically evaluating different approaches for identifying disease subtypes in complex EHR. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12911-021-01693-6. BioMed Central 2021-12-08 /pmc/articles/PMC8653614/ /pubmed/34879829 http://dx.doi.org/10.1186/s12911-021-01693-6 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Alexander, Nonie
Alexander, Daniel C.
Barkhof, Frederik
Denaxas, Spiros
Identifying and evaluating clinical subtypes of Alzheimer’s disease in care electronic health records using unsupervised machine learning
title Identifying and evaluating clinical subtypes of Alzheimer’s disease in care electronic health records using unsupervised machine learning
title_full Identifying and evaluating clinical subtypes of Alzheimer’s disease in care electronic health records using unsupervised machine learning
title_fullStr Identifying and evaluating clinical subtypes of Alzheimer’s disease in care electronic health records using unsupervised machine learning
title_full_unstemmed Identifying and evaluating clinical subtypes of Alzheimer’s disease in care electronic health records using unsupervised machine learning
title_short Identifying and evaluating clinical subtypes of Alzheimer’s disease in care electronic health records using unsupervised machine learning
title_sort identifying and evaluating clinical subtypes of alzheimer’s disease in care electronic health records using unsupervised machine learning
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8653614/
https://www.ncbi.nlm.nih.gov/pubmed/34879829
http://dx.doi.org/10.1186/s12911-021-01693-6
work_keys_str_mv AT alexandernonie identifyingandevaluatingclinicalsubtypesofalzheimersdiseaseincareelectronichealthrecordsusingunsupervisedmachinelearning
AT alexanderdanielc identifyingandevaluatingclinicalsubtypesofalzheimersdiseaseincareelectronichealthrecordsusingunsupervisedmachinelearning
AT barkhoffrederik identifyingandevaluatingclinicalsubtypesofalzheimersdiseaseincareelectronichealthrecordsusingunsupervisedmachinelearning
AT denaxasspiros identifyingandevaluatingclinicalsubtypesofalzheimersdiseaseincareelectronichealthrecordsusingunsupervisedmachinelearning