Cargando…

Defining Disease Phenotypes in Primary Care Electronic Health Records by a Machine Learning Approach: A Case Study in Identifying Rheumatoid Arthritis

OBJECTIVES: 1) To use data-driven method to examine clinical codes (risk factors) of a medical condition in primary care electronic health records (EHRs) that can accurately predict a diagnosis of the condition in secondary care EHRs. 2) To develop and validate a disease phenotyping algorithm for rh...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhou, Shang-Ming, Fernandez-Gutierrez, Fabiola, Kennedy, Jonathan, Cooksey, Roxanne, Atkinson, Mark, Denaxas, Spiros, Siebert, Stefan, Dixon, William G., O’Neill, Terence W., Choy, Ernest, Sudlow, Cathie, Brophy, Sinead
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4852928/
https://www.ncbi.nlm.nih.gov/pubmed/27135409
http://dx.doi.org/10.1371/journal.pone.0154515
_version_ 1782430010383007744
author Zhou, Shang-Ming
Fernandez-Gutierrez, Fabiola
Kennedy, Jonathan
Cooksey, Roxanne
Atkinson, Mark
Denaxas, Spiros
Siebert, Stefan
Dixon, William G.
O’Neill, Terence W.
Choy, Ernest
Sudlow, Cathie
Brophy, Sinead
author_facet Zhou, Shang-Ming
Fernandez-Gutierrez, Fabiola
Kennedy, Jonathan
Cooksey, Roxanne
Atkinson, Mark
Denaxas, Spiros
Siebert, Stefan
Dixon, William G.
O’Neill, Terence W.
Choy, Ernest
Sudlow, Cathie
Brophy, Sinead
author_sort Zhou, Shang-Ming
collection PubMed
description OBJECTIVES: 1) To use data-driven method to examine clinical codes (risk factors) of a medical condition in primary care electronic health records (EHRs) that can accurately predict a diagnosis of the condition in secondary care EHRs. 2) To develop and validate a disease phenotyping algorithm for rheumatoid arthritis using primary care EHRs. METHODS: This study linked routine primary and secondary care EHRs in Wales, UK. A machine learning based scheme was used to identify patients with rheumatoid arthritis from primary care EHRs via the following steps: i) selection of variables by comparing relative frequencies of Read codes in the primary care dataset associated with disease case compared to non-disease control (disease/non-disease based on the secondary care diagnosis); ii) reduction of predictors/associated variables using a Random Forest method, iii) induction of decision rules from decision tree model. The proposed method was then extensively validated on an independent dataset, and compared for performance with two existing deterministic algorithms for RA which had been developed using expert clinical knowledge. RESULTS: Primary care EHRs were available for 2,238,360 patients over the age of 16 and of these 20,667 were also linked in the secondary care rheumatology clinical system. In the linked dataset, 900 predictors (out of a total of 43,100 variables) in the primary care record were discovered more frequently in those with versus those without RA. These variables were reduced to 37 groups of related clinical codes, which were used to develop a decision tree model. The final algorithm identified 8 predictors related to diagnostic codes for RA, medication codes, such as those for disease modifying anti-rheumatic drugs, and absence of alternative diagnoses such as psoriatic arthritis. The proposed data-driven method performed as well as the expert clinical knowledge based methods. CONCLUSION: Data-driven scheme, such as ensemble machine learning methods, has the potential of identifying the most informative predictors in a cost-effective and rapid way to accurately and reliably classify rheumatoid arthritis or other complex medical conditions in primary care EHRs.
format Online
Article
Text
id pubmed-4852928
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-48529282016-05-13 Defining Disease Phenotypes in Primary Care Electronic Health Records by a Machine Learning Approach: A Case Study in Identifying Rheumatoid Arthritis Zhou, Shang-Ming Fernandez-Gutierrez, Fabiola Kennedy, Jonathan Cooksey, Roxanne Atkinson, Mark Denaxas, Spiros Siebert, Stefan Dixon, William G. O’Neill, Terence W. Choy, Ernest Sudlow, Cathie Brophy, Sinead PLoS One Research Article OBJECTIVES: 1) To use data-driven method to examine clinical codes (risk factors) of a medical condition in primary care electronic health records (EHRs) that can accurately predict a diagnosis of the condition in secondary care EHRs. 2) To develop and validate a disease phenotyping algorithm for rheumatoid arthritis using primary care EHRs. METHODS: This study linked routine primary and secondary care EHRs in Wales, UK. A machine learning based scheme was used to identify patients with rheumatoid arthritis from primary care EHRs via the following steps: i) selection of variables by comparing relative frequencies of Read codes in the primary care dataset associated with disease case compared to non-disease control (disease/non-disease based on the secondary care diagnosis); ii) reduction of predictors/associated variables using a Random Forest method, iii) induction of decision rules from decision tree model. The proposed method was then extensively validated on an independent dataset, and compared for performance with two existing deterministic algorithms for RA which had been developed using expert clinical knowledge. RESULTS: Primary care EHRs were available for 2,238,360 patients over the age of 16 and of these 20,667 were also linked in the secondary care rheumatology clinical system. In the linked dataset, 900 predictors (out of a total of 43,100 variables) in the primary care record were discovered more frequently in those with versus those without RA. These variables were reduced to 37 groups of related clinical codes, which were used to develop a decision tree model. The final algorithm identified 8 predictors related to diagnostic codes for RA, medication codes, such as those for disease modifying anti-rheumatic drugs, and absence of alternative diagnoses such as psoriatic arthritis. The proposed data-driven method performed as well as the expert clinical knowledge based methods. CONCLUSION: Data-driven scheme, such as ensemble machine learning methods, has the potential of identifying the most informative predictors in a cost-effective and rapid way to accurately and reliably classify rheumatoid arthritis or other complex medical conditions in primary care EHRs. Public Library of Science 2016-05-02 /pmc/articles/PMC4852928/ /pubmed/27135409 http://dx.doi.org/10.1371/journal.pone.0154515 Text en © 2016 Zhou et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Zhou, Shang-Ming
Fernandez-Gutierrez, Fabiola
Kennedy, Jonathan
Cooksey, Roxanne
Atkinson, Mark
Denaxas, Spiros
Siebert, Stefan
Dixon, William G.
O’Neill, Terence W.
Choy, Ernest
Sudlow, Cathie
Brophy, Sinead
Defining Disease Phenotypes in Primary Care Electronic Health Records by a Machine Learning Approach: A Case Study in Identifying Rheumatoid Arthritis
title Defining Disease Phenotypes in Primary Care Electronic Health Records by a Machine Learning Approach: A Case Study in Identifying Rheumatoid Arthritis
title_full Defining Disease Phenotypes in Primary Care Electronic Health Records by a Machine Learning Approach: A Case Study in Identifying Rheumatoid Arthritis
title_fullStr Defining Disease Phenotypes in Primary Care Electronic Health Records by a Machine Learning Approach: A Case Study in Identifying Rheumatoid Arthritis
title_full_unstemmed Defining Disease Phenotypes in Primary Care Electronic Health Records by a Machine Learning Approach: A Case Study in Identifying Rheumatoid Arthritis
title_short Defining Disease Phenotypes in Primary Care Electronic Health Records by a Machine Learning Approach: A Case Study in Identifying Rheumatoid Arthritis
title_sort defining disease phenotypes in primary care electronic health records by a machine learning approach: a case study in identifying rheumatoid arthritis
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4852928/
https://www.ncbi.nlm.nih.gov/pubmed/27135409
http://dx.doi.org/10.1371/journal.pone.0154515
work_keys_str_mv AT zhoushangming definingdiseasephenotypesinprimarycareelectronichealthrecordsbyamachinelearningapproachacasestudyinidentifyingrheumatoidarthritis
AT fernandezgutierrezfabiola definingdiseasephenotypesinprimarycareelectronichealthrecordsbyamachinelearningapproachacasestudyinidentifyingrheumatoidarthritis
AT kennedyjonathan definingdiseasephenotypesinprimarycareelectronichealthrecordsbyamachinelearningapproachacasestudyinidentifyingrheumatoidarthritis
AT cookseyroxanne definingdiseasephenotypesinprimarycareelectronichealthrecordsbyamachinelearningapproachacasestudyinidentifyingrheumatoidarthritis
AT atkinsonmark definingdiseasephenotypesinprimarycareelectronichealthrecordsbyamachinelearningapproachacasestudyinidentifyingrheumatoidarthritis
AT denaxasspiros definingdiseasephenotypesinprimarycareelectronichealthrecordsbyamachinelearningapproachacasestudyinidentifyingrheumatoidarthritis
AT siebertstefan definingdiseasephenotypesinprimarycareelectronichealthrecordsbyamachinelearningapproachacasestudyinidentifyingrheumatoidarthritis
AT dixonwilliamg definingdiseasephenotypesinprimarycareelectronichealthrecordsbyamachinelearningapproachacasestudyinidentifyingrheumatoidarthritis
AT oneillterencew definingdiseasephenotypesinprimarycareelectronichealthrecordsbyamachinelearningapproachacasestudyinidentifyingrheumatoidarthritis
AT choyernest definingdiseasephenotypesinprimarycareelectronichealthrecordsbyamachinelearningapproachacasestudyinidentifyingrheumatoidarthritis
AT sudlowcathie definingdiseasephenotypesinprimarycareelectronichealthrecordsbyamachinelearningapproachacasestudyinidentifyingrheumatoidarthritis
AT definingdiseasephenotypesinprimarycareelectronichealthrecordsbyamachinelearningapproachacasestudyinidentifyingrheumatoidarthritis
AT brophysinead definingdiseasephenotypesinprimarycareelectronichealthrecordsbyamachinelearningapproachacasestudyinidentifyingrheumatoidarthritis