Cargando…
Methods to Develop an Electronic Medical Record Phenotype Algorithm to Compare the Risk of Coronary Artery Disease across 3 Chronic Disease Cohorts
BACKGROUND: Typically, algorithms to classify phenotypes using electronic medical record (EMR) data were developed to perform well in a specific patient population. There is increasing interest in analyses which can allow study of a specific outcome across different diseases. Such a study in the EMR...
Autores principales: | , , , , , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4547801/ https://www.ncbi.nlm.nih.gov/pubmed/26301417 http://dx.doi.org/10.1371/journal.pone.0136651 |
_version_ | 1782387112154234880 |
---|---|
author | Liao, Katherine P. Ananthakrishnan, Ashwin N. Kumar, Vishesh Xia, Zongqi Cagan, Andrew Gainer, Vivian S. Goryachev, Sergey Chen, Pei Savova, Guergana K. Agniel, Denis Churchill, Susanne Lee, Jaeyoung Murphy, Shawn N. Plenge, Robert M. Szolovits, Peter Kohane, Isaac Shaw, Stanley Y. Karlson, Elizabeth W. Cai, Tianxi |
author_facet | Liao, Katherine P. Ananthakrishnan, Ashwin N. Kumar, Vishesh Xia, Zongqi Cagan, Andrew Gainer, Vivian S. Goryachev, Sergey Chen, Pei Savova, Guergana K. Agniel, Denis Churchill, Susanne Lee, Jaeyoung Murphy, Shawn N. Plenge, Robert M. Szolovits, Peter Kohane, Isaac Shaw, Stanley Y. Karlson, Elizabeth W. Cai, Tianxi |
author_sort | Liao, Katherine P. |
collection | PubMed |
description | BACKGROUND: Typically, algorithms to classify phenotypes using electronic medical record (EMR) data were developed to perform well in a specific patient population. There is increasing interest in analyses which can allow study of a specific outcome across different diseases. Such a study in the EMR would require an algorithm that can be applied across different patient populations. Our objectives were: (1) to develop an algorithm that would enable the study of coronary artery disease (CAD) across diverse patient populations; (2) to study the impact of adding narrative data extracted using natural language processing (NLP) in the algorithm. Additionally, we demonstrate how to implement CAD algorithm to compare risk across 3 chronic diseases in a preliminary study. METHODS AND RESULTS: We studied 3 established EMR based patient cohorts: diabetes mellitus (DM, n = 65,099), inflammatory bowel disease (IBD, n = 10,974), and rheumatoid arthritis (RA, n = 4,453) from two large academic centers. We developed a CAD algorithm using NLP in addition to structured data (e.g. ICD9 codes) in the RA cohort and validated it in the DM and IBD cohorts. The CAD algorithm using NLP in addition to structured data achieved specificity >95% with a positive predictive value (PPV) 90% in the training (RA) and validation sets (IBD and DM). The addition of NLP data improved the sensitivity for all cohorts, classifying an additional 17% of CAD subjects in IBD and 10% in DM while maintaining PPV of 90%. The algorithm classified 16,488 DM (26.1%), 457 IBD (4.2%), and 245 RA (5.0%) with CAD. In a cross-sectional analysis, CAD risk was 63% lower in RA and 68% lower in IBD compared to DM (p<0.0001) after adjusting for traditional cardiovascular risk factors. CONCLUSIONS: We developed and validated a CAD algorithm that performed well across diverse patient populations. The addition of NLP into the CAD algorithm improved the sensitivity of the algorithm, particularly in cohorts where the prevalence of CAD was low. Preliminary data suggest that CAD risk was significantly lower in RA and IBD compared to DM. |
format | Online Article Text |
id | pubmed-4547801 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-45478012015-09-01 Methods to Develop an Electronic Medical Record Phenotype Algorithm to Compare the Risk of Coronary Artery Disease across 3 Chronic Disease Cohorts Liao, Katherine P. Ananthakrishnan, Ashwin N. Kumar, Vishesh Xia, Zongqi Cagan, Andrew Gainer, Vivian S. Goryachev, Sergey Chen, Pei Savova, Guergana K. Agniel, Denis Churchill, Susanne Lee, Jaeyoung Murphy, Shawn N. Plenge, Robert M. Szolovits, Peter Kohane, Isaac Shaw, Stanley Y. Karlson, Elizabeth W. Cai, Tianxi PLoS One Research Article BACKGROUND: Typically, algorithms to classify phenotypes using electronic medical record (EMR) data were developed to perform well in a specific patient population. There is increasing interest in analyses which can allow study of a specific outcome across different diseases. Such a study in the EMR would require an algorithm that can be applied across different patient populations. Our objectives were: (1) to develop an algorithm that would enable the study of coronary artery disease (CAD) across diverse patient populations; (2) to study the impact of adding narrative data extracted using natural language processing (NLP) in the algorithm. Additionally, we demonstrate how to implement CAD algorithm to compare risk across 3 chronic diseases in a preliminary study. METHODS AND RESULTS: We studied 3 established EMR based patient cohorts: diabetes mellitus (DM, n = 65,099), inflammatory bowel disease (IBD, n = 10,974), and rheumatoid arthritis (RA, n = 4,453) from two large academic centers. We developed a CAD algorithm using NLP in addition to structured data (e.g. ICD9 codes) in the RA cohort and validated it in the DM and IBD cohorts. The CAD algorithm using NLP in addition to structured data achieved specificity >95% with a positive predictive value (PPV) 90% in the training (RA) and validation sets (IBD and DM). The addition of NLP data improved the sensitivity for all cohorts, classifying an additional 17% of CAD subjects in IBD and 10% in DM while maintaining PPV of 90%. The algorithm classified 16,488 DM (26.1%), 457 IBD (4.2%), and 245 RA (5.0%) with CAD. In a cross-sectional analysis, CAD risk was 63% lower in RA and 68% lower in IBD compared to DM (p<0.0001) after adjusting for traditional cardiovascular risk factors. CONCLUSIONS: We developed and validated a CAD algorithm that performed well across diverse patient populations. The addition of NLP into the CAD algorithm improved the sensitivity of the algorithm, particularly in cohorts where the prevalence of CAD was low. Preliminary data suggest that CAD risk was significantly lower in RA and IBD compared to DM. Public Library of Science 2015-08-24 /pmc/articles/PMC4547801/ /pubmed/26301417 http://dx.doi.org/10.1371/journal.pone.0136651 Text en © 2015 Liao et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Liao, Katherine P. Ananthakrishnan, Ashwin N. Kumar, Vishesh Xia, Zongqi Cagan, Andrew Gainer, Vivian S. Goryachev, Sergey Chen, Pei Savova, Guergana K. Agniel, Denis Churchill, Susanne Lee, Jaeyoung Murphy, Shawn N. Plenge, Robert M. Szolovits, Peter Kohane, Isaac Shaw, Stanley Y. Karlson, Elizabeth W. Cai, Tianxi Methods to Develop an Electronic Medical Record Phenotype Algorithm to Compare the Risk of Coronary Artery Disease across 3 Chronic Disease Cohorts |
title | Methods to Develop an Electronic Medical Record Phenotype Algorithm to Compare the Risk of Coronary Artery Disease across 3 Chronic Disease Cohorts |
title_full | Methods to Develop an Electronic Medical Record Phenotype Algorithm to Compare the Risk of Coronary Artery Disease across 3 Chronic Disease Cohorts |
title_fullStr | Methods to Develop an Electronic Medical Record Phenotype Algorithm to Compare the Risk of Coronary Artery Disease across 3 Chronic Disease Cohorts |
title_full_unstemmed | Methods to Develop an Electronic Medical Record Phenotype Algorithm to Compare the Risk of Coronary Artery Disease across 3 Chronic Disease Cohorts |
title_short | Methods to Develop an Electronic Medical Record Phenotype Algorithm to Compare the Risk of Coronary Artery Disease across 3 Chronic Disease Cohorts |
title_sort | methods to develop an electronic medical record phenotype algorithm to compare the risk of coronary artery disease across 3 chronic disease cohorts |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4547801/ https://www.ncbi.nlm.nih.gov/pubmed/26301417 http://dx.doi.org/10.1371/journal.pone.0136651 |
work_keys_str_mv | AT liaokatherinep methodstodevelopanelectronicmedicalrecordphenotypealgorithmtocomparetheriskofcoronaryarterydiseaseacross3chronicdiseasecohorts AT ananthakrishnanashwinn methodstodevelopanelectronicmedicalrecordphenotypealgorithmtocomparetheriskofcoronaryarterydiseaseacross3chronicdiseasecohorts AT kumarvishesh methodstodevelopanelectronicmedicalrecordphenotypealgorithmtocomparetheriskofcoronaryarterydiseaseacross3chronicdiseasecohorts AT xiazongqi methodstodevelopanelectronicmedicalrecordphenotypealgorithmtocomparetheriskofcoronaryarterydiseaseacross3chronicdiseasecohorts AT caganandrew methodstodevelopanelectronicmedicalrecordphenotypealgorithmtocomparetheriskofcoronaryarterydiseaseacross3chronicdiseasecohorts AT gainervivians methodstodevelopanelectronicmedicalrecordphenotypealgorithmtocomparetheriskofcoronaryarterydiseaseacross3chronicdiseasecohorts AT goryachevsergey methodstodevelopanelectronicmedicalrecordphenotypealgorithmtocomparetheriskofcoronaryarterydiseaseacross3chronicdiseasecohorts AT chenpei methodstodevelopanelectronicmedicalrecordphenotypealgorithmtocomparetheriskofcoronaryarterydiseaseacross3chronicdiseasecohorts AT savovaguerganak methodstodevelopanelectronicmedicalrecordphenotypealgorithmtocomparetheriskofcoronaryarterydiseaseacross3chronicdiseasecohorts AT agnieldenis methodstodevelopanelectronicmedicalrecordphenotypealgorithmtocomparetheriskofcoronaryarterydiseaseacross3chronicdiseasecohorts AT churchillsusanne methodstodevelopanelectronicmedicalrecordphenotypealgorithmtocomparetheriskofcoronaryarterydiseaseacross3chronicdiseasecohorts AT leejaeyoung methodstodevelopanelectronicmedicalrecordphenotypealgorithmtocomparetheriskofcoronaryarterydiseaseacross3chronicdiseasecohorts AT murphyshawnn methodstodevelopanelectronicmedicalrecordphenotypealgorithmtocomparetheriskofcoronaryarterydiseaseacross3chronicdiseasecohorts AT plengerobertm methodstodevelopanelectronicmedicalrecordphenotypealgorithmtocomparetheriskofcoronaryarterydiseaseacross3chronicdiseasecohorts AT szolovitspeter methodstodevelopanelectronicmedicalrecordphenotypealgorithmtocomparetheriskofcoronaryarterydiseaseacross3chronicdiseasecohorts AT kohaneisaac methodstodevelopanelectronicmedicalrecordphenotypealgorithmtocomparetheriskofcoronaryarterydiseaseacross3chronicdiseasecohorts AT shawstanleyy methodstodevelopanelectronicmedicalrecordphenotypealgorithmtocomparetheriskofcoronaryarterydiseaseacross3chronicdiseasecohorts AT karlsonelizabethw methodstodevelopanelectronicmedicalrecordphenotypealgorithmtocomparetheriskofcoronaryarterydiseaseacross3chronicdiseasecohorts AT caitianxi methodstodevelopanelectronicmedicalrecordphenotypealgorithmtocomparetheriskofcoronaryarterydiseaseacross3chronicdiseasecohorts |