Cargando…

Temporal characterization of Alzheimer's Disease with sequences of clinical records

BACKGROUND: Alzheimer's Disease (AD) is a complex clinical phenotype with unprecedented social and economic tolls on an ageing global population. Real-world data (RWD) from electronic health records (EHRs) offer opportunities to accelerate precision drug development and scale epidemiological re...

Descripción completa

Detalles Bibliográficos
Autores principales: Estiri, Hossein, Azhir, Alaleh, Blacker, Deborah L., Ritchie, Christine S., Patel, Chirag J., Murphy, Shawn N.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10236187/
https://www.ncbi.nlm.nih.gov/pubmed/37247495
http://dx.doi.org/10.1016/j.ebiom.2023.104629
_version_ 1785052858679296000
author Estiri, Hossein
Azhir, Alaleh
Blacker, Deborah L.
Ritchie, Christine S.
Patel, Chirag J.
Murphy, Shawn N.
author_facet Estiri, Hossein
Azhir, Alaleh
Blacker, Deborah L.
Ritchie, Christine S.
Patel, Chirag J.
Murphy, Shawn N.
author_sort Estiri, Hossein
collection PubMed
description BACKGROUND: Alzheimer's Disease (AD) is a complex clinical phenotype with unprecedented social and economic tolls on an ageing global population. Real-world data (RWD) from electronic health records (EHRs) offer opportunities to accelerate precision drug development and scale epidemiological research on AD. A precise characterization of AD cohorts is needed to address the noise abundant in RWD. METHODS: We conducted a retrospective cohort study to develop and test computational models for AD cohort identification using clinical data from 8 Massachusetts healthcare systems. We mined temporal representations from EHR data using the transitive sequential pattern mining algorithm (tSPM) to train and validate our models. We then tested our models against a held-out test set from a review of medical records to adjudicate the presence of AD. We trained two classes of Machine Learning models, using Gradient Boosting Machine (GBM), to compare the utility of AD diagnosis records versus the tSPM temporal representations (comprising sequences of diagnosis and medication observations) from electronic medical records for characterizing AD cohorts. FINDINGS: In a group of 4985 patients, we identified 219 tSPM temporal representations (i.e., transitive sequences) of medical records for constructing the best classification models. The models with sequential features improved AD classification by a magnitude of 3–16 percent over the use of AD diagnosis codes alone. The computed cohort included 663 patients, 35 of whom had no record of AD. Six groups of tSPM sequences were identified for characterizing the AD cohorts. INTERPRETATION: We present sequential patterns of diagnosis and medication codes from electronic medical records, as digital markers of Alzheimer's Disease. Classification algorithms developed on sequential patterns can replace standard features from EHRs to enrich phenotype modelling. FUNDING: 10.13039/100000002National Institutes of Health: the 10.13039/100000049National Institute on Aging (RF1AG074372) and the 10.13039/100000060National Institute of Allergy and Infectious Diseases (R01AI165535).
format Online
Article
Text
id pubmed-10236187
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-102361872023-06-03 Temporal characterization of Alzheimer's Disease with sequences of clinical records Estiri, Hossein Azhir, Alaleh Blacker, Deborah L. Ritchie, Christine S. Patel, Chirag J. Murphy, Shawn N. eBioMedicine Articles BACKGROUND: Alzheimer's Disease (AD) is a complex clinical phenotype with unprecedented social and economic tolls on an ageing global population. Real-world data (RWD) from electronic health records (EHRs) offer opportunities to accelerate precision drug development and scale epidemiological research on AD. A precise characterization of AD cohorts is needed to address the noise abundant in RWD. METHODS: We conducted a retrospective cohort study to develop and test computational models for AD cohort identification using clinical data from 8 Massachusetts healthcare systems. We mined temporal representations from EHR data using the transitive sequential pattern mining algorithm (tSPM) to train and validate our models. We then tested our models against a held-out test set from a review of medical records to adjudicate the presence of AD. We trained two classes of Machine Learning models, using Gradient Boosting Machine (GBM), to compare the utility of AD diagnosis records versus the tSPM temporal representations (comprising sequences of diagnosis and medication observations) from electronic medical records for characterizing AD cohorts. FINDINGS: In a group of 4985 patients, we identified 219 tSPM temporal representations (i.e., transitive sequences) of medical records for constructing the best classification models. The models with sequential features improved AD classification by a magnitude of 3–16 percent over the use of AD diagnosis codes alone. The computed cohort included 663 patients, 35 of whom had no record of AD. Six groups of tSPM sequences were identified for characterizing the AD cohorts. INTERPRETATION: We present sequential patterns of diagnosis and medication codes from electronic medical records, as digital markers of Alzheimer's Disease. Classification algorithms developed on sequential patterns can replace standard features from EHRs to enrich phenotype modelling. FUNDING: 10.13039/100000002National Institutes of Health: the 10.13039/100000049National Institute on Aging (RF1AG074372) and the 10.13039/100000060National Institute of Allergy and Infectious Diseases (R01AI165535). Elsevier 2023-05-27 /pmc/articles/PMC10236187/ /pubmed/37247495 http://dx.doi.org/10.1016/j.ebiom.2023.104629 Text en © 2023 The Author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Articles
Estiri, Hossein
Azhir, Alaleh
Blacker, Deborah L.
Ritchie, Christine S.
Patel, Chirag J.
Murphy, Shawn N.
Temporal characterization of Alzheimer's Disease with sequences of clinical records
title Temporal characterization of Alzheimer's Disease with sequences of clinical records
title_full Temporal characterization of Alzheimer's Disease with sequences of clinical records
title_fullStr Temporal characterization of Alzheimer's Disease with sequences of clinical records
title_full_unstemmed Temporal characterization of Alzheimer's Disease with sequences of clinical records
title_short Temporal characterization of Alzheimer's Disease with sequences of clinical records
title_sort temporal characterization of alzheimer's disease with sequences of clinical records
topic Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10236187/
https://www.ncbi.nlm.nih.gov/pubmed/37247495
http://dx.doi.org/10.1016/j.ebiom.2023.104629
work_keys_str_mv AT estirihossein temporalcharacterizationofalzheimersdiseasewithsequencesofclinicalrecords
AT azhiralaleh temporalcharacterizationofalzheimersdiseasewithsequencesofclinicalrecords
AT blackerdeborahl temporalcharacterizationofalzheimersdiseasewithsequencesofclinicalrecords
AT ritchiechristines temporalcharacterizationofalzheimersdiseasewithsequencesofclinicalrecords
AT patelchiragj temporalcharacterizationofalzheimersdiseasewithsequencesofclinicalrecords
AT murphyshawnn temporalcharacterizationofalzheimersdiseasewithsequencesofclinicalrecords