Cargando…

SAT-LB111 Improving Classification of Diabetes Etiology in Electronic Resources Using Phenotype Algorithms and Polygenic Risk Scores

Electronic Health Records (EHR) contain rich data to identify and study diabetes. Many phenotype algorithms have been developed to identify research subjects with type 2 diabetes (T2D), but very few accurately identify type 1 diabetes (T1D) cases or more rare forms of monogenic and atypical metaboli...

Descripción completa

Detalles Bibliográficos
Autores principales:	Sulieman, Lina, He, Jing, Carroll, Robert, Bastarache, Lisa, Ramirez, Andrea
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2020
Materias:	Diabetes Mellitus and Glucose Metabolism
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7209076/ http://dx.doi.org/10.1210/jendso/bvaa046.2239

_version_	1783530993928372224
author	Sulieman, Lina He, Jing Carroll, Robert Bastarache, Lisa Ramirez, Andrea
author_facet	Sulieman, Lina He, Jing Carroll, Robert Bastarache, Lisa Ramirez, Andrea
author_sort	Sulieman, Lina
collection	PubMed
description	Electronic Health Records (EHR) contain rich data to identify and study diabetes. Many phenotype algorithms have been developed to identify research subjects with type 2 diabetes (T2D), but very few accurately identify type 1 diabetes (T1D) cases or more rare forms of monogenic and atypical metabolic presentations. Polygenetic risk scores (PRS) quantify risk of a disease using common genomic variants well for both T1D and T2D. In this study, we apply validated phenotyping algorithms to EHRs linked to a genomic biobank to understand the independent contribution of PRS to classification of diabetes etiology and generate additional novel markers to distinguish subtypes of diabetes in EHR data. Using a de-identified mirror of medical center’s electronic health record, we applied published algorithms for T1D and T2D to identify cases, and used natural language processing and chart review strategies to identify cases of maturity onset diabetes of the young (MODY) and other more rare presentations. This novel approach included additional data types such as medication sequencing, ratio and temporality of insulin and non-insulin agents, clinical genetic testing, and ratios of diagnostic codes. Chart review was performed to validate etiology. To calculate PRS, we used genome wide genotyping from our BioBank, the de-identified biobank linking EHR to genomic data using coefficients of 65 published T1D SNPS and 76,996 T2D SNPS using PLINK in Caucasian subjects. In the dataset, we identified 82,238 cases of T2D but only 130 cases of T1D using the most cited published algorithms. Adding novel structured elements and natural language processing identified an additional 138 cases of T1D and distinguished 354 cases as MODY. Among over 90,000 subjects with genotyping data available, we included 72,624 Caucasian subjects since PRS coefficients were generated in Caucasian cohorts. Among those subjects, 248, 6,488, and 21 subjects were identified as T1D, T2D, and MODY subjects respectively in our final PRS cohort. The T1D PRS did significantly discriminate well between cases and controls (Mann-Whitney p-value is 3.4 e-17). The PRS for T2D did not significantly discriminate between cases and controls using published algorithms. The atypical case count was too low to calculate PRS discrimination. Calculation of the PRS score was limited by quality inclusion of variants available, and discrimination may improve in larger data sets. Additionally, blinded physician case review is ongoing to validate the novel classification scheme and provide a gold standard for machine learning approaches that can be applied in validation sets.
format	Online Article Text
id	pubmed-7209076
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-72090762020-05-13 SAT-LB111 Improving Classification of Diabetes Etiology in Electronic Resources Using Phenotype Algorithms and Polygenic Risk Scores Sulieman, Lina He, Jing Carroll, Robert Bastarache, Lisa Ramirez, Andrea J Endocr Soc Diabetes Mellitus and Glucose Metabolism Electronic Health Records (EHR) contain rich data to identify and study diabetes. Many phenotype algorithms have been developed to identify research subjects with type 2 diabetes (T2D), but very few accurately identify type 1 diabetes (T1D) cases or more rare forms of monogenic and atypical metabolic presentations. Polygenetic risk scores (PRS) quantify risk of a disease using common genomic variants well for both T1D and T2D. In this study, we apply validated phenotyping algorithms to EHRs linked to a genomic biobank to understand the independent contribution of PRS to classification of diabetes etiology and generate additional novel markers to distinguish subtypes of diabetes in EHR data. Using a de-identified mirror of medical center’s electronic health record, we applied published algorithms for T1D and T2D to identify cases, and used natural language processing and chart review strategies to identify cases of maturity onset diabetes of the young (MODY) and other more rare presentations. This novel approach included additional data types such as medication sequencing, ratio and temporality of insulin and non-insulin agents, clinical genetic testing, and ratios of diagnostic codes. Chart review was performed to validate etiology. To calculate PRS, we used genome wide genotyping from our BioBank, the de-identified biobank linking EHR to genomic data using coefficients of 65 published T1D SNPS and 76,996 T2D SNPS using PLINK in Caucasian subjects. In the dataset, we identified 82,238 cases of T2D but only 130 cases of T1D using the most cited published algorithms. Adding novel structured elements and natural language processing identified an additional 138 cases of T1D and distinguished 354 cases as MODY. Among over 90,000 subjects with genotyping data available, we included 72,624 Caucasian subjects since PRS coefficients were generated in Caucasian cohorts. Among those subjects, 248, 6,488, and 21 subjects were identified as T1D, T2D, and MODY subjects respectively in our final PRS cohort. The T1D PRS did significantly discriminate well between cases and controls (Mann-Whitney p-value is 3.4 e-17). The PRS for T2D did not significantly discriminate between cases and controls using published algorithms. The atypical case count was too low to calculate PRS discrimination. Calculation of the PRS score was limited by quality inclusion of variants available, and discrimination may improve in larger data sets. Additionally, blinded physician case review is ongoing to validate the novel classification scheme and provide a gold standard for machine learning approaches that can be applied in validation sets. Oxford University Press 2020-05-08 /pmc/articles/PMC7209076/ http://dx.doi.org/10.1210/jendso/bvaa046.2239 Text en © Endocrine Society 2020. http://creativecommons.org/licenses/by-nc-nd/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial-NoDerivs licence (http://creativecommons.org/licenses/by-nc-nd/4.0/), which permits non-commercial reproduction and distribution of the work, in any medium, provided the original work is not altered or transformed in any way, and that the work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle	Diabetes Mellitus and Glucose Metabolism Sulieman, Lina He, Jing Carroll, Robert Bastarache, Lisa Ramirez, Andrea SAT-LB111 Improving Classification of Diabetes Etiology in Electronic Resources Using Phenotype Algorithms and Polygenic Risk Scores
title	SAT-LB111 Improving Classification of Diabetes Etiology in Electronic Resources Using Phenotype Algorithms and Polygenic Risk Scores
title_full	SAT-LB111 Improving Classification of Diabetes Etiology in Electronic Resources Using Phenotype Algorithms and Polygenic Risk Scores
title_fullStr	SAT-LB111 Improving Classification of Diabetes Etiology in Electronic Resources Using Phenotype Algorithms and Polygenic Risk Scores
title_full_unstemmed	SAT-LB111 Improving Classification of Diabetes Etiology in Electronic Resources Using Phenotype Algorithms and Polygenic Risk Scores
title_short	SAT-LB111 Improving Classification of Diabetes Etiology in Electronic Resources Using Phenotype Algorithms and Polygenic Risk Scores
title_sort	sat-lb111 improving classification of diabetes etiology in electronic resources using phenotype algorithms and polygenic risk scores
topic	Diabetes Mellitus and Glucose Metabolism
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7209076/ http://dx.doi.org/10.1210/jendso/bvaa046.2239
work_keys_str_mv	AT suliemanlina satlb111improvingclassificationofdiabetesetiologyinelectronicresourcesusingphenotypealgorithmsandpolygenicriskscores AT hejing satlb111improvingclassificationofdiabetesetiologyinelectronicresourcesusingphenotypealgorithmsandpolygenicriskscores AT carrollrobert satlb111improvingclassificationofdiabetesetiologyinelectronicresourcesusingphenotypealgorithmsandpolygenicriskscores AT bastarachelisa satlb111improvingclassificationofdiabetesetiologyinelectronicresourcesusingphenotypealgorithmsandpolygenicriskscores AT ramirezandrea satlb111improvingclassificationofdiabetesetiologyinelectronicresourcesusingphenotypealgorithmsandpolygenicriskscores

SAT-LB111 Improving Classification of Diabetes Etiology in Electronic Resources Using Phenotype Algorithms and Polygenic Risk Scores

Ejemplares similares