Cargando…

A platform for phenotyping disease progression and associated longitudinal risk factors in large-scale EHRs, with application to incident diabetes complications in the UK Biobank

OBJECTIVE: Modern healthcare data reflect massive multi-level and multi-scale information collected over many years. The majority of the existing phenotyping algorithms use case–control definitions of disease. This paper aims to study the time to disease onset and progression and identify the time-v...

Descripción completa

Detalles Bibliográficos
Autores principales: Kim, Do Hyun, Jensen, Aubrey, Jones, Kelly, Raghavan, Sridharan, Phillips, Lawrence S, Hung, Adriana, Sun, Yan V, Li, Gang, Reaven, Peter, Zhou, Hua, Zhou, Jin J
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9912368/
https://www.ncbi.nlm.nih.gov/pubmed/36789288
http://dx.doi.org/10.1093/jamiaopen/ooad006
_version_ 1784885191872872448
author Kim, Do Hyun
Jensen, Aubrey
Jones, Kelly
Raghavan, Sridharan
Phillips, Lawrence S
Hung, Adriana
Sun, Yan V
Li, Gang
Reaven, Peter
Zhou, Hua
Zhou, Jin J
author_facet Kim, Do Hyun
Jensen, Aubrey
Jones, Kelly
Raghavan, Sridharan
Phillips, Lawrence S
Hung, Adriana
Sun, Yan V
Li, Gang
Reaven, Peter
Zhou, Hua
Zhou, Jin J
author_sort Kim, Do Hyun
collection PubMed
description OBJECTIVE: Modern healthcare data reflect massive multi-level and multi-scale information collected over many years. The majority of the existing phenotyping algorithms use case–control definitions of disease. This paper aims to study the time to disease onset and progression and identify the time-varying risk factors that drive them. MATERIALS AND METHODS: We developed an algorithmic approach to phenotyping the incidence of diseases by consolidating data sources from the UK Biobank (UKB), including primary care electronic health records (EHRs). We focused on defining events, event dates, and their censoring time, including relevant terms and existing phenotypes, excluding generic, rare, or semantically distant terms, forward-mapping terminology terms, and expert review. We applied our approach to phenotyping diabetes complications, including a composite cardiovascular disease (CVD) outcome, diabetic kidney disease (DKD), and diabetic retinopathy (DR), in the UKB study. RESULTS: We identified 49 049 participants with diabetes. Among them, 1023 had type 1 diabetes (T1D), and 40 193 had type 2 diabetes (T2D). A total of 23 833 diabetes subjects had linked primary care records. There were 3237, 3113, and 4922 patients with CVD, DKD, and DR events, respectively. The risk prediction performance for each outcome was assessed, and our results are consistent with the prediction area under the ROC (receiver operating characteristic) curve (AUC) of standard risk prediction models using cohort studies. DISCUSSION AND CONCLUSION: Our publicly available pipeline and platform enable streamlined curation of incidence events, identification of time-varying risk factors underlying disease progression, and the definition of a relevant cohort for time-to-event analyses. These important steps need to be considered simultaneously to study disease progression.
format Online
Article
Text
id pubmed-9912368
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-99123682023-02-13 A platform for phenotyping disease progression and associated longitudinal risk factors in large-scale EHRs, with application to incident diabetes complications in the UK Biobank Kim, Do Hyun Jensen, Aubrey Jones, Kelly Raghavan, Sridharan Phillips, Lawrence S Hung, Adriana Sun, Yan V Li, Gang Reaven, Peter Zhou, Hua Zhou, Jin J JAMIA Open Research and Applications OBJECTIVE: Modern healthcare data reflect massive multi-level and multi-scale information collected over many years. The majority of the existing phenotyping algorithms use case–control definitions of disease. This paper aims to study the time to disease onset and progression and identify the time-varying risk factors that drive them. MATERIALS AND METHODS: We developed an algorithmic approach to phenotyping the incidence of diseases by consolidating data sources from the UK Biobank (UKB), including primary care electronic health records (EHRs). We focused on defining events, event dates, and their censoring time, including relevant terms and existing phenotypes, excluding generic, rare, or semantically distant terms, forward-mapping terminology terms, and expert review. We applied our approach to phenotyping diabetes complications, including a composite cardiovascular disease (CVD) outcome, diabetic kidney disease (DKD), and diabetic retinopathy (DR), in the UKB study. RESULTS: We identified 49 049 participants with diabetes. Among them, 1023 had type 1 diabetes (T1D), and 40 193 had type 2 diabetes (T2D). A total of 23 833 diabetes subjects had linked primary care records. There were 3237, 3113, and 4922 patients with CVD, DKD, and DR events, respectively. The risk prediction performance for each outcome was assessed, and our results are consistent with the prediction area under the ROC (receiver operating characteristic) curve (AUC) of standard risk prediction models using cohort studies. DISCUSSION AND CONCLUSION: Our publicly available pipeline and platform enable streamlined curation of incidence events, identification of time-varying risk factors underlying disease progression, and the definition of a relevant cohort for time-to-event analyses. These important steps need to be considered simultaneously to study disease progression. Oxford University Press 2023-02-09 /pmc/articles/PMC9912368/ /pubmed/36789288 http://dx.doi.org/10.1093/jamiaopen/ooad006 Text en © The Author(s) 2023. Published by Oxford University Press on behalf of the American Medical Informatics Association. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research and Applications
Kim, Do Hyun
Jensen, Aubrey
Jones, Kelly
Raghavan, Sridharan
Phillips, Lawrence S
Hung, Adriana
Sun, Yan V
Li, Gang
Reaven, Peter
Zhou, Hua
Zhou, Jin J
A platform for phenotyping disease progression and associated longitudinal risk factors in large-scale EHRs, with application to incident diabetes complications in the UK Biobank
title A platform for phenotyping disease progression and associated longitudinal risk factors in large-scale EHRs, with application to incident diabetes complications in the UK Biobank
title_full A platform for phenotyping disease progression and associated longitudinal risk factors in large-scale EHRs, with application to incident diabetes complications in the UK Biobank
title_fullStr A platform for phenotyping disease progression and associated longitudinal risk factors in large-scale EHRs, with application to incident diabetes complications in the UK Biobank
title_full_unstemmed A platform for phenotyping disease progression and associated longitudinal risk factors in large-scale EHRs, with application to incident diabetes complications in the UK Biobank
title_short A platform for phenotyping disease progression and associated longitudinal risk factors in large-scale EHRs, with application to incident diabetes complications in the UK Biobank
title_sort platform for phenotyping disease progression and associated longitudinal risk factors in large-scale ehrs, with application to incident diabetes complications in the uk biobank
topic Research and Applications
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9912368/
https://www.ncbi.nlm.nih.gov/pubmed/36789288
http://dx.doi.org/10.1093/jamiaopen/ooad006
work_keys_str_mv AT kimdohyun aplatformforphenotypingdiseaseprogressionandassociatedlongitudinalriskfactorsinlargescaleehrswithapplicationtoincidentdiabetescomplicationsintheukbiobank
AT jensenaubrey aplatformforphenotypingdiseaseprogressionandassociatedlongitudinalriskfactorsinlargescaleehrswithapplicationtoincidentdiabetescomplicationsintheukbiobank
AT joneskelly aplatformforphenotypingdiseaseprogressionandassociatedlongitudinalriskfactorsinlargescaleehrswithapplicationtoincidentdiabetescomplicationsintheukbiobank
AT raghavansridharan aplatformforphenotypingdiseaseprogressionandassociatedlongitudinalriskfactorsinlargescaleehrswithapplicationtoincidentdiabetescomplicationsintheukbiobank
AT phillipslawrences aplatformforphenotypingdiseaseprogressionandassociatedlongitudinalriskfactorsinlargescaleehrswithapplicationtoincidentdiabetescomplicationsintheukbiobank
AT hungadriana aplatformforphenotypingdiseaseprogressionandassociatedlongitudinalriskfactorsinlargescaleehrswithapplicationtoincidentdiabetescomplicationsintheukbiobank
AT sunyanv aplatformforphenotypingdiseaseprogressionandassociatedlongitudinalriskfactorsinlargescaleehrswithapplicationtoincidentdiabetescomplicationsintheukbiobank
AT ligang aplatformforphenotypingdiseaseprogressionandassociatedlongitudinalriskfactorsinlargescaleehrswithapplicationtoincidentdiabetescomplicationsintheukbiobank
AT reavenpeter aplatformforphenotypingdiseaseprogressionandassociatedlongitudinalriskfactorsinlargescaleehrswithapplicationtoincidentdiabetescomplicationsintheukbiobank
AT zhouhua aplatformforphenotypingdiseaseprogressionandassociatedlongitudinalriskfactorsinlargescaleehrswithapplicationtoincidentdiabetescomplicationsintheukbiobank
AT zhoujinj aplatformforphenotypingdiseaseprogressionandassociatedlongitudinalriskfactorsinlargescaleehrswithapplicationtoincidentdiabetescomplicationsintheukbiobank
AT kimdohyun platformforphenotypingdiseaseprogressionandassociatedlongitudinalriskfactorsinlargescaleehrswithapplicationtoincidentdiabetescomplicationsintheukbiobank
AT jensenaubrey platformforphenotypingdiseaseprogressionandassociatedlongitudinalriskfactorsinlargescaleehrswithapplicationtoincidentdiabetescomplicationsintheukbiobank
AT joneskelly platformforphenotypingdiseaseprogressionandassociatedlongitudinalriskfactorsinlargescaleehrswithapplicationtoincidentdiabetescomplicationsintheukbiobank
AT raghavansridharan platformforphenotypingdiseaseprogressionandassociatedlongitudinalriskfactorsinlargescaleehrswithapplicationtoincidentdiabetescomplicationsintheukbiobank
AT phillipslawrences platformforphenotypingdiseaseprogressionandassociatedlongitudinalriskfactorsinlargescaleehrswithapplicationtoincidentdiabetescomplicationsintheukbiobank
AT hungadriana platformforphenotypingdiseaseprogressionandassociatedlongitudinalriskfactorsinlargescaleehrswithapplicationtoincidentdiabetescomplicationsintheukbiobank
AT sunyanv platformforphenotypingdiseaseprogressionandassociatedlongitudinalriskfactorsinlargescaleehrswithapplicationtoincidentdiabetescomplicationsintheukbiobank
AT ligang platformforphenotypingdiseaseprogressionandassociatedlongitudinalriskfactorsinlargescaleehrswithapplicationtoincidentdiabetescomplicationsintheukbiobank
AT reavenpeter platformforphenotypingdiseaseprogressionandassociatedlongitudinalriskfactorsinlargescaleehrswithapplicationtoincidentdiabetescomplicationsintheukbiobank
AT zhouhua platformforphenotypingdiseaseprogressionandassociatedlongitudinalriskfactorsinlargescaleehrswithapplicationtoincidentdiabetescomplicationsintheukbiobank
AT zhoujinj platformforphenotypingdiseaseprogressionandassociatedlongitudinalriskfactorsinlargescaleehrswithapplicationtoincidentdiabetescomplicationsintheukbiobank