Cargando…
Automatic Prediction of Rheumatoid Arthritis Disease Activity from the Electronic Medical Records
OBJECTIVE: We aimed to mine the data in the Electronic Medical Record to automatically discover patients' Rheumatoid Arthritis disease activity at discrete rheumatology clinic visits. We cast the problem as a document classification task where the feature space includes concepts from the clinic...
Autores principales: | , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3745469/ https://www.ncbi.nlm.nih.gov/pubmed/23976944 http://dx.doi.org/10.1371/journal.pone.0069932 |
_version_ | 1782280701686579200 |
---|---|
author | Lin, Chen Karlson, Elizabeth W. Canhao, Helena Miller, Timothy A. Dligach, Dmitriy Chen, Pei Jun Perez, Raul Natanael Guzman Shen, Yuanyan Weinblatt, Michael E. Shadick, Nancy A. Plenge, Robert M. Savova, Guergana K. |
author_facet | Lin, Chen Karlson, Elizabeth W. Canhao, Helena Miller, Timothy A. Dligach, Dmitriy Chen, Pei Jun Perez, Raul Natanael Guzman Shen, Yuanyan Weinblatt, Michael E. Shadick, Nancy A. Plenge, Robert M. Savova, Guergana K. |
author_sort | Lin, Chen |
collection | PubMed |
description | OBJECTIVE: We aimed to mine the data in the Electronic Medical Record to automatically discover patients' Rheumatoid Arthritis disease activity at discrete rheumatology clinic visits. We cast the problem as a document classification task where the feature space includes concepts from the clinical narrative and lab values as stored in the Electronic Medical Record. MATERIALS AND METHODS: The Training Set consisted of 2792 clinical notes and associated lab values. Test Set 1 included 1749 clinical notes and associated lab values. Test Set 2 included 344 clinical notes for which there were no associated lab values. The Apache clinical Text Analysis and Knowledge Extraction System was used to analyze the text and transform it into informative features to be combined with relevant lab values. RESULTS: Experiments over a range of machine learning algorithms and features were conducted. The best performing combination was linear kernel Support Vector Machines with Unified Medical Language System Concept Unique Identifier features with feature selection and lab values. The Area Under the Receiver Operating Characteristic Curve (AUC) is 0.831 (σ = 0.0317), statistically significant as compared to two baselines (AUC = 0.758, σ = 0.0291). Algorithms demonstrated superior performance on cases clinically defined as extreme categories of disease activity (Remission and High) compared to those defined as intermediate categories (Moderate and Low) and included laboratory data on inflammatory markers. CONCLUSION: Automatic Rheumatoid Arthritis disease activity discovery from Electronic Medical Record data is a learnable task approximating human performance. As a result, this approach might have several research applications, such as the identification of patients for genome-wide pharmacogenetic studies that require large sample sizes with precise definitions of disease activity and response to therapies. |
format | Online Article Text |
id | pubmed-3745469 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-37454692013-08-23 Automatic Prediction of Rheumatoid Arthritis Disease Activity from the Electronic Medical Records Lin, Chen Karlson, Elizabeth W. Canhao, Helena Miller, Timothy A. Dligach, Dmitriy Chen, Pei Jun Perez, Raul Natanael Guzman Shen, Yuanyan Weinblatt, Michael E. Shadick, Nancy A. Plenge, Robert M. Savova, Guergana K. PLoS One Research Article OBJECTIVE: We aimed to mine the data in the Electronic Medical Record to automatically discover patients' Rheumatoid Arthritis disease activity at discrete rheumatology clinic visits. We cast the problem as a document classification task where the feature space includes concepts from the clinical narrative and lab values as stored in the Electronic Medical Record. MATERIALS AND METHODS: The Training Set consisted of 2792 clinical notes and associated lab values. Test Set 1 included 1749 clinical notes and associated lab values. Test Set 2 included 344 clinical notes for which there were no associated lab values. The Apache clinical Text Analysis and Knowledge Extraction System was used to analyze the text and transform it into informative features to be combined with relevant lab values. RESULTS: Experiments over a range of machine learning algorithms and features were conducted. The best performing combination was linear kernel Support Vector Machines with Unified Medical Language System Concept Unique Identifier features with feature selection and lab values. The Area Under the Receiver Operating Characteristic Curve (AUC) is 0.831 (σ = 0.0317), statistically significant as compared to two baselines (AUC = 0.758, σ = 0.0291). Algorithms demonstrated superior performance on cases clinically defined as extreme categories of disease activity (Remission and High) compared to those defined as intermediate categories (Moderate and Low) and included laboratory data on inflammatory markers. CONCLUSION: Automatic Rheumatoid Arthritis disease activity discovery from Electronic Medical Record data is a learnable task approximating human performance. As a result, this approach might have several research applications, such as the identification of patients for genome-wide pharmacogenetic studies that require large sample sizes with precise definitions of disease activity and response to therapies. Public Library of Science 2013-08-16 /pmc/articles/PMC3745469/ /pubmed/23976944 http://dx.doi.org/10.1371/journal.pone.0069932 Text en © 2013 Lin et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Lin, Chen Karlson, Elizabeth W. Canhao, Helena Miller, Timothy A. Dligach, Dmitriy Chen, Pei Jun Perez, Raul Natanael Guzman Shen, Yuanyan Weinblatt, Michael E. Shadick, Nancy A. Plenge, Robert M. Savova, Guergana K. Automatic Prediction of Rheumatoid Arthritis Disease Activity from the Electronic Medical Records |
title | Automatic Prediction of Rheumatoid Arthritis Disease Activity from the Electronic Medical Records |
title_full | Automatic Prediction of Rheumatoid Arthritis Disease Activity from the Electronic Medical Records |
title_fullStr | Automatic Prediction of Rheumatoid Arthritis Disease Activity from the Electronic Medical Records |
title_full_unstemmed | Automatic Prediction of Rheumatoid Arthritis Disease Activity from the Electronic Medical Records |
title_short | Automatic Prediction of Rheumatoid Arthritis Disease Activity from the Electronic Medical Records |
title_sort | automatic prediction of rheumatoid arthritis disease activity from the electronic medical records |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3745469/ https://www.ncbi.nlm.nih.gov/pubmed/23976944 http://dx.doi.org/10.1371/journal.pone.0069932 |
work_keys_str_mv | AT linchen automaticpredictionofrheumatoidarthritisdiseaseactivityfromtheelectronicmedicalrecords AT karlsonelizabethw automaticpredictionofrheumatoidarthritisdiseaseactivityfromtheelectronicmedicalrecords AT canhaohelena automaticpredictionofrheumatoidarthritisdiseaseactivityfromtheelectronicmedicalrecords AT millertimothya automaticpredictionofrheumatoidarthritisdiseaseactivityfromtheelectronicmedicalrecords AT dligachdmitriy automaticpredictionofrheumatoidarthritisdiseaseactivityfromtheelectronicmedicalrecords AT chenpeijun automaticpredictionofrheumatoidarthritisdiseaseactivityfromtheelectronicmedicalrecords AT perezraulnatanaelguzman automaticpredictionofrheumatoidarthritisdiseaseactivityfromtheelectronicmedicalrecords AT shenyuanyan automaticpredictionofrheumatoidarthritisdiseaseactivityfromtheelectronicmedicalrecords AT weinblattmichaele automaticpredictionofrheumatoidarthritisdiseaseactivityfromtheelectronicmedicalrecords AT shadicknancya automaticpredictionofrheumatoidarthritisdiseaseactivityfromtheelectronicmedicalrecords AT plengerobertm automaticpredictionofrheumatoidarthritisdiseaseactivityfromtheelectronicmedicalrecords AT savovaguerganak automaticpredictionofrheumatoidarthritisdiseaseactivityfromtheelectronicmedicalrecords |