Cargando…

SAT-LB111 Improving Classification of Diabetes Etiology in Electronic Resources Using Phenotype Algorithms and Polygenic Risk Scores

Electronic Health Records (EHR) contain rich data to identify and study diabetes. Many phenotype algorithms have been developed to identify research subjects with type 2 diabetes (T2D), but very few accurately identify type 1 diabetes (T1D) cases or more rare forms of monogenic and atypical metaboli...

Descripción completa

Detalles Bibliográficos
Autores principales: Sulieman, Lina, He, Jing, Carroll, Robert, Bastarache, Lisa, Ramirez, Andrea
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7209076/
http://dx.doi.org/10.1210/jendso/bvaa046.2239
_version_ 1783530993928372224
author Sulieman, Lina
He, Jing
Carroll, Robert
Bastarache, Lisa
Ramirez, Andrea
author_facet Sulieman, Lina
He, Jing
Carroll, Robert
Bastarache, Lisa
Ramirez, Andrea
author_sort Sulieman, Lina
collection PubMed
description Electronic Health Records (EHR) contain rich data to identify and study diabetes. Many phenotype algorithms have been developed to identify research subjects with type 2 diabetes (T2D), but very few accurately identify type 1 diabetes (T1D) cases or more rare forms of monogenic and atypical metabolic presentations. Polygenetic risk scores (PRS) quantify risk of a disease using common genomic variants well for both T1D and T2D. In this study, we apply validated phenotyping algorithms to EHRs linked to a genomic biobank to understand the independent contribution of PRS to classification of diabetes etiology and generate additional novel markers to distinguish subtypes of diabetes in EHR data. Using a de-identified mirror of medical center’s electronic health record, we applied published algorithms for T1D and T2D to identify cases, and used natural language processing and chart review strategies to identify cases of maturity onset diabetes of the young (MODY) and other more rare presentations. This novel approach included additional data types such as medication sequencing, ratio and temporality of insulin and non-insulin agents, clinical genetic testing, and ratios of diagnostic codes. Chart review was performed to validate etiology. To calculate PRS, we used genome wide genotyping from our BioBank, the de-identified biobank linking EHR to genomic data using coefficients of 65 published T1D SNPS and 76,996 T2D SNPS using PLINK in Caucasian subjects. In the dataset, we identified 82,238 cases of T2D but only 130 cases of T1D using the most cited published algorithms. Adding novel structured elements and natural language processing identified an additional 138 cases of T1D and distinguished 354 cases as MODY. Among over 90,000 subjects with genotyping data available, we included 72,624 Caucasian subjects since PRS coefficients were generated in Caucasian cohorts. Among those subjects, 248, 6,488, and 21 subjects were identified as T1D, T2D, and MODY subjects respectively in our final PRS cohort. The T1D PRS did significantly discriminate well between cases and controls (Mann-Whitney p-value is 3.4 e-17). The PRS for T2D did not significantly discriminate between cases and controls using published algorithms. The atypical case count was too low to calculate PRS discrimination. Calculation of the PRS score was limited by quality inclusion of variants available, and discrimination may improve in larger data sets. Additionally, blinded physician case review is ongoing to validate the novel classification scheme and provide a gold standard for machine learning approaches that can be applied in validation sets.
format Online
Article
Text
id pubmed-7209076
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-72090762020-05-13 SAT-LB111 Improving Classification of Diabetes Etiology in Electronic Resources Using Phenotype Algorithms and Polygenic Risk Scores Sulieman, Lina He, Jing Carroll, Robert Bastarache, Lisa Ramirez, Andrea J Endocr Soc Diabetes Mellitus and Glucose Metabolism Electronic Health Records (EHR) contain rich data to identify and study diabetes. Many phenotype algorithms have been developed to identify research subjects with type 2 diabetes (T2D), but very few accurately identify type 1 diabetes (T1D) cases or more rare forms of monogenic and atypical metabolic presentations. Polygenetic risk scores (PRS) quantify risk of a disease using common genomic variants well for both T1D and T2D. In this study, we apply validated phenotyping algorithms to EHRs linked to a genomic biobank to understand the independent contribution of PRS to classification of diabetes etiology and generate additional novel markers to distinguish subtypes of diabetes in EHR data. Using a de-identified mirror of medical center’s electronic health record, we applied published algorithms for T1D and T2D to identify cases, and used natural language processing and chart review strategies to identify cases of maturity onset diabetes of the young (MODY) and other more rare presentations. This novel approach included additional data types such as medication sequencing, ratio and temporality of insulin and non-insulin agents, clinical genetic testing, and ratios of diagnostic codes. Chart review was performed to validate etiology. To calculate PRS, we used genome wide genotyping from our BioBank, the de-identified biobank linking EHR to genomic data using coefficients of 65 published T1D SNPS and 76,996 T2D SNPS using PLINK in Caucasian subjects. In the dataset, we identified 82,238 cases of T2D but only 130 cases of T1D using the most cited published algorithms. Adding novel structured elements and natural language processing identified an additional 138 cases of T1D and distinguished 354 cases as MODY. Among over 90,000 subjects with genotyping data available, we included 72,624 Caucasian subjects since PRS coefficients were generated in Caucasian cohorts. Among those subjects, 248, 6,488, and 21 subjects were identified as T1D, T2D, and MODY subjects respectively in our final PRS cohort. The T1D PRS did significantly discriminate well between cases and controls (Mann-Whitney p-value is 3.4 e-17). The PRS for T2D did not significantly discriminate between cases and controls using published algorithms. The atypical case count was too low to calculate PRS discrimination. Calculation of the PRS score was limited by quality inclusion of variants available, and discrimination may improve in larger data sets. Additionally, blinded physician case review is ongoing to validate the novel classification scheme and provide a gold standard for machine learning approaches that can be applied in validation sets. Oxford University Press 2020-05-08 /pmc/articles/PMC7209076/ http://dx.doi.org/10.1210/jendso/bvaa046.2239 Text en © Endocrine Society 2020. http://creativecommons.org/licenses/by-nc-nd/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs licence (http://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial reproduction and distribution of the work, in any medium, provided the original work is not altered or transformed in any way, and that the work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Diabetes Mellitus and Glucose Metabolism
Sulieman, Lina
He, Jing
Carroll, Robert
Bastarache, Lisa
Ramirez, Andrea
SAT-LB111 Improving Classification of Diabetes Etiology in Electronic Resources Using Phenotype Algorithms and Polygenic Risk Scores
title SAT-LB111 Improving Classification of Diabetes Etiology in Electronic Resources Using Phenotype Algorithms and Polygenic Risk Scores
title_full SAT-LB111 Improving Classification of Diabetes Etiology in Electronic Resources Using Phenotype Algorithms and Polygenic Risk Scores
title_fullStr SAT-LB111 Improving Classification of Diabetes Etiology in Electronic Resources Using Phenotype Algorithms and Polygenic Risk Scores
title_full_unstemmed SAT-LB111 Improving Classification of Diabetes Etiology in Electronic Resources Using Phenotype Algorithms and Polygenic Risk Scores
title_short SAT-LB111 Improving Classification of Diabetes Etiology in Electronic Resources Using Phenotype Algorithms and Polygenic Risk Scores
title_sort sat-lb111 improving classification of diabetes etiology in electronic resources using phenotype algorithms and polygenic risk scores
topic Diabetes Mellitus and Glucose Metabolism
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7209076/
http://dx.doi.org/10.1210/jendso/bvaa046.2239
work_keys_str_mv AT suliemanlina satlb111improvingclassificationofdiabetesetiologyinelectronicresourcesusingphenotypealgorithmsandpolygenicriskscores
AT hejing satlb111improvingclassificationofdiabetesetiologyinelectronicresourcesusingphenotypealgorithmsandpolygenicriskscores
AT carrollrobert satlb111improvingclassificationofdiabetesetiologyinelectronicresourcesusingphenotypealgorithmsandpolygenicriskscores
AT bastarachelisa satlb111improvingclassificationofdiabetesetiologyinelectronicresourcesusingphenotypealgorithmsandpolygenicriskscores
AT ramirezandrea satlb111improvingclassificationofdiabetesetiologyinelectronicresourcesusingphenotypealgorithmsandpolygenicriskscores