Cargando…
A platform for phenotyping disease progression and associated longitudinal risk factors in large-scale EHRs, with application to incident diabetes complications in the UK Biobank
OBJECTIVE: Modern healthcare data reflect massive multi-level and multi-scale information collected over many years. The majority of the existing phenotyping algorithms use case–control definitions of disease. This paper aims to study the time to disease onset and progression and identify the time-v...
Autores principales: | , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9912368/ https://www.ncbi.nlm.nih.gov/pubmed/36789288 http://dx.doi.org/10.1093/jamiaopen/ooad006 |
_version_ | 1784885191872872448 |
---|---|
author | Kim, Do Hyun Jensen, Aubrey Jones, Kelly Raghavan, Sridharan Phillips, Lawrence S Hung, Adriana Sun, Yan V Li, Gang Reaven, Peter Zhou, Hua Zhou, Jin J |
author_facet | Kim, Do Hyun Jensen, Aubrey Jones, Kelly Raghavan, Sridharan Phillips, Lawrence S Hung, Adriana Sun, Yan V Li, Gang Reaven, Peter Zhou, Hua Zhou, Jin J |
author_sort | Kim, Do Hyun |
collection | PubMed |
description | OBJECTIVE: Modern healthcare data reflect massive multi-level and multi-scale information collected over many years. The majority of the existing phenotyping algorithms use case–control definitions of disease. This paper aims to study the time to disease onset and progression and identify the time-varying risk factors that drive them. MATERIALS AND METHODS: We developed an algorithmic approach to phenotyping the incidence of diseases by consolidating data sources from the UK Biobank (UKB), including primary care electronic health records (EHRs). We focused on defining events, event dates, and their censoring time, including relevant terms and existing phenotypes, excluding generic, rare, or semantically distant terms, forward-mapping terminology terms, and expert review. We applied our approach to phenotyping diabetes complications, including a composite cardiovascular disease (CVD) outcome, diabetic kidney disease (DKD), and diabetic retinopathy (DR), in the UKB study. RESULTS: We identified 49 049 participants with diabetes. Among them, 1023 had type 1 diabetes (T1D), and 40 193 had type 2 diabetes (T2D). A total of 23 833 diabetes subjects had linked primary care records. There were 3237, 3113, and 4922 patients with CVD, DKD, and DR events, respectively. The risk prediction performance for each outcome was assessed, and our results are consistent with the prediction area under the ROC (receiver operating characteristic) curve (AUC) of standard risk prediction models using cohort studies. DISCUSSION AND CONCLUSION: Our publicly available pipeline and platform enable streamlined curation of incidence events, identification of time-varying risk factors underlying disease progression, and the definition of a relevant cohort for time-to-event analyses. These important steps need to be considered simultaneously to study disease progression. |
format | Online Article Text |
id | pubmed-9912368 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-99123682023-02-13 A platform for phenotyping disease progression and associated longitudinal risk factors in large-scale EHRs, with application to incident diabetes complications in the UK Biobank Kim, Do Hyun Jensen, Aubrey Jones, Kelly Raghavan, Sridharan Phillips, Lawrence S Hung, Adriana Sun, Yan V Li, Gang Reaven, Peter Zhou, Hua Zhou, Jin J JAMIA Open Research and Applications OBJECTIVE: Modern healthcare data reflect massive multi-level and multi-scale information collected over many years. The majority of the existing phenotyping algorithms use case–control definitions of disease. This paper aims to study the time to disease onset and progression and identify the time-varying risk factors that drive them. MATERIALS AND METHODS: We developed an algorithmic approach to phenotyping the incidence of diseases by consolidating data sources from the UK Biobank (UKB), including primary care electronic health records (EHRs). We focused on defining events, event dates, and their censoring time, including relevant terms and existing phenotypes, excluding generic, rare, or semantically distant terms, forward-mapping terminology terms, and expert review. We applied our approach to phenotyping diabetes complications, including a composite cardiovascular disease (CVD) outcome, diabetic kidney disease (DKD), and diabetic retinopathy (DR), in the UKB study. RESULTS: We identified 49 049 participants with diabetes. Among them, 1023 had type 1 diabetes (T1D), and 40 193 had type 2 diabetes (T2D). A total of 23 833 diabetes subjects had linked primary care records. There were 3237, 3113, and 4922 patients with CVD, DKD, and DR events, respectively. The risk prediction performance for each outcome was assessed, and our results are consistent with the prediction area under the ROC (receiver operating characteristic) curve (AUC) of standard risk prediction models using cohort studies. DISCUSSION AND CONCLUSION: Our publicly available pipeline and platform enable streamlined curation of incidence events, identification of time-varying risk factors underlying disease progression, and the definition of a relevant cohort for time-to-event analyses. These important steps need to be considered simultaneously to study disease progression. Oxford University Press 2023-02-09 /pmc/articles/PMC9912368/ /pubmed/36789288 http://dx.doi.org/10.1093/jamiaopen/ooad006 Text en © The Author(s) 2023. Published by Oxford University Press on behalf of the American Medical Informatics Association. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research and Applications Kim, Do Hyun Jensen, Aubrey Jones, Kelly Raghavan, Sridharan Phillips, Lawrence S Hung, Adriana Sun, Yan V Li, Gang Reaven, Peter Zhou, Hua Zhou, Jin J A platform for phenotyping disease progression and associated longitudinal risk factors in large-scale EHRs, with application to incident diabetes complications in the UK Biobank |
title | A platform for phenotyping disease progression and associated longitudinal risk factors in large-scale EHRs, with application to incident diabetes complications in the UK Biobank |
title_full | A platform for phenotyping disease progression and associated longitudinal risk factors in large-scale EHRs, with application to incident diabetes complications in the UK Biobank |
title_fullStr | A platform for phenotyping disease progression and associated longitudinal risk factors in large-scale EHRs, with application to incident diabetes complications in the UK Biobank |
title_full_unstemmed | A platform for phenotyping disease progression and associated longitudinal risk factors in large-scale EHRs, with application to incident diabetes complications in the UK Biobank |
title_short | A platform for phenotyping disease progression and associated longitudinal risk factors in large-scale EHRs, with application to incident diabetes complications in the UK Biobank |
title_sort | platform for phenotyping disease progression and associated longitudinal risk factors in large-scale ehrs, with application to incident diabetes complications in the uk biobank |
topic | Research and Applications |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9912368/ https://www.ncbi.nlm.nih.gov/pubmed/36789288 http://dx.doi.org/10.1093/jamiaopen/ooad006 |
work_keys_str_mv | AT kimdohyun aplatformforphenotypingdiseaseprogressionandassociatedlongitudinalriskfactorsinlargescaleehrswithapplicationtoincidentdiabetescomplicationsintheukbiobank AT jensenaubrey aplatformforphenotypingdiseaseprogressionandassociatedlongitudinalriskfactorsinlargescaleehrswithapplicationtoincidentdiabetescomplicationsintheukbiobank AT joneskelly aplatformforphenotypingdiseaseprogressionandassociatedlongitudinalriskfactorsinlargescaleehrswithapplicationtoincidentdiabetescomplicationsintheukbiobank AT raghavansridharan aplatformforphenotypingdiseaseprogressionandassociatedlongitudinalriskfactorsinlargescaleehrswithapplicationtoincidentdiabetescomplicationsintheukbiobank AT phillipslawrences aplatformforphenotypingdiseaseprogressionandassociatedlongitudinalriskfactorsinlargescaleehrswithapplicationtoincidentdiabetescomplicationsintheukbiobank AT hungadriana aplatformforphenotypingdiseaseprogressionandassociatedlongitudinalriskfactorsinlargescaleehrswithapplicationtoincidentdiabetescomplicationsintheukbiobank AT sunyanv aplatformforphenotypingdiseaseprogressionandassociatedlongitudinalriskfactorsinlargescaleehrswithapplicationtoincidentdiabetescomplicationsintheukbiobank AT ligang aplatformforphenotypingdiseaseprogressionandassociatedlongitudinalriskfactorsinlargescaleehrswithapplicationtoincidentdiabetescomplicationsintheukbiobank AT reavenpeter aplatformforphenotypingdiseaseprogressionandassociatedlongitudinalriskfactorsinlargescaleehrswithapplicationtoincidentdiabetescomplicationsintheukbiobank AT zhouhua aplatformforphenotypingdiseaseprogressionandassociatedlongitudinalriskfactorsinlargescaleehrswithapplicationtoincidentdiabetescomplicationsintheukbiobank AT zhoujinj aplatformforphenotypingdiseaseprogressionandassociatedlongitudinalriskfactorsinlargescaleehrswithapplicationtoincidentdiabetescomplicationsintheukbiobank AT kimdohyun platformforphenotypingdiseaseprogressionandassociatedlongitudinalriskfactorsinlargescaleehrswithapplicationtoincidentdiabetescomplicationsintheukbiobank AT jensenaubrey platformforphenotypingdiseaseprogressionandassociatedlongitudinalriskfactorsinlargescaleehrswithapplicationtoincidentdiabetescomplicationsintheukbiobank AT joneskelly platformforphenotypingdiseaseprogressionandassociatedlongitudinalriskfactorsinlargescaleehrswithapplicationtoincidentdiabetescomplicationsintheukbiobank AT raghavansridharan platformforphenotypingdiseaseprogressionandassociatedlongitudinalriskfactorsinlargescaleehrswithapplicationtoincidentdiabetescomplicationsintheukbiobank AT phillipslawrences platformforphenotypingdiseaseprogressionandassociatedlongitudinalriskfactorsinlargescaleehrswithapplicationtoincidentdiabetescomplicationsintheukbiobank AT hungadriana platformforphenotypingdiseaseprogressionandassociatedlongitudinalriskfactorsinlargescaleehrswithapplicationtoincidentdiabetescomplicationsintheukbiobank AT sunyanv platformforphenotypingdiseaseprogressionandassociatedlongitudinalriskfactorsinlargescaleehrswithapplicationtoincidentdiabetescomplicationsintheukbiobank AT ligang platformforphenotypingdiseaseprogressionandassociatedlongitudinalriskfactorsinlargescaleehrswithapplicationtoincidentdiabetescomplicationsintheukbiobank AT reavenpeter platformforphenotypingdiseaseprogressionandassociatedlongitudinalriskfactorsinlargescaleehrswithapplicationtoincidentdiabetescomplicationsintheukbiobank AT zhouhua platformforphenotypingdiseaseprogressionandassociatedlongitudinalriskfactorsinlargescaleehrswithapplicationtoincidentdiabetescomplicationsintheukbiobank AT zhoujinj platformforphenotypingdiseaseprogressionandassociatedlongitudinalriskfactorsinlargescaleehrswithapplicationtoincidentdiabetescomplicationsintheukbiobank |