Cargando…

Automatic Prediction of Rheumatoid Arthritis Disease Activity from the Electronic Medical Records

OBJECTIVE: We aimed to mine the data in the Electronic Medical Record to automatically discover patients' Rheumatoid Arthritis disease activity at discrete rheumatology clinic visits. We cast the problem as a document classification task where the feature space includes concepts from the clinic...

Descripción completa

Detalles Bibliográficos
Autores principales: Lin, Chen, Karlson, Elizabeth W., Canhao, Helena, Miller, Timothy A., Dligach, Dmitriy, Chen, Pei Jun, Perez, Raul Natanael Guzman, Shen, Yuanyan, Weinblatt, Michael E., Shadick, Nancy A., Plenge, Robert M., Savova, Guergana K.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3745469/
https://www.ncbi.nlm.nih.gov/pubmed/23976944
http://dx.doi.org/10.1371/journal.pone.0069932
_version_ 1782280701686579200
author Lin, Chen
Karlson, Elizabeth W.
Canhao, Helena
Miller, Timothy A.
Dligach, Dmitriy
Chen, Pei Jun
Perez, Raul Natanael Guzman
Shen, Yuanyan
Weinblatt, Michael E.
Shadick, Nancy A.
Plenge, Robert M.
Savova, Guergana K.
author_facet Lin, Chen
Karlson, Elizabeth W.
Canhao, Helena
Miller, Timothy A.
Dligach, Dmitriy
Chen, Pei Jun
Perez, Raul Natanael Guzman
Shen, Yuanyan
Weinblatt, Michael E.
Shadick, Nancy A.
Plenge, Robert M.
Savova, Guergana K.
author_sort Lin, Chen
collection PubMed
description OBJECTIVE: We aimed to mine the data in the Electronic Medical Record to automatically discover patients' Rheumatoid Arthritis disease activity at discrete rheumatology clinic visits. We cast the problem as a document classification task where the feature space includes concepts from the clinical narrative and lab values as stored in the Electronic Medical Record. MATERIALS AND METHODS: The Training Set consisted of 2792 clinical notes and associated lab values. Test Set 1 included 1749 clinical notes and associated lab values. Test Set 2 included 344 clinical notes for which there were no associated lab values. The Apache clinical Text Analysis and Knowledge Extraction System was used to analyze the text and transform it into informative features to be combined with relevant lab values. RESULTS: Experiments over a range of machine learning algorithms and features were conducted. The best performing combination was linear kernel Support Vector Machines with Unified Medical Language System Concept Unique Identifier features with feature selection and lab values. The Area Under the Receiver Operating Characteristic Curve (AUC) is 0.831 (σ = 0.0317), statistically significant as compared to two baselines (AUC = 0.758, σ = 0.0291). Algorithms demonstrated superior performance on cases clinically defined as extreme categories of disease activity (Remission and High) compared to those defined as intermediate categories (Moderate and Low) and included laboratory data on inflammatory markers. CONCLUSION: Automatic Rheumatoid Arthritis disease activity discovery from Electronic Medical Record data is a learnable task approximating human performance. As a result, this approach might have several research applications, such as the identification of patients for genome-wide pharmacogenetic studies that require large sample sizes with precise definitions of disease activity and response to therapies.
format Online
Article
Text
id pubmed-3745469
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-37454692013-08-23 Automatic Prediction of Rheumatoid Arthritis Disease Activity from the Electronic Medical Records Lin, Chen Karlson, Elizabeth W. Canhao, Helena Miller, Timothy A. Dligach, Dmitriy Chen, Pei Jun Perez, Raul Natanael Guzman Shen, Yuanyan Weinblatt, Michael E. Shadick, Nancy A. Plenge, Robert M. Savova, Guergana K. PLoS One Research Article OBJECTIVE: We aimed to mine the data in the Electronic Medical Record to automatically discover patients' Rheumatoid Arthritis disease activity at discrete rheumatology clinic visits. We cast the problem as a document classification task where the feature space includes concepts from the clinical narrative and lab values as stored in the Electronic Medical Record. MATERIALS AND METHODS: The Training Set consisted of 2792 clinical notes and associated lab values. Test Set 1 included 1749 clinical notes and associated lab values. Test Set 2 included 344 clinical notes for which there were no associated lab values. The Apache clinical Text Analysis and Knowledge Extraction System was used to analyze the text and transform it into informative features to be combined with relevant lab values. RESULTS: Experiments over a range of machine learning algorithms and features were conducted. The best performing combination was linear kernel Support Vector Machines with Unified Medical Language System Concept Unique Identifier features with feature selection and lab values. The Area Under the Receiver Operating Characteristic Curve (AUC) is 0.831 (σ = 0.0317), statistically significant as compared to two baselines (AUC = 0.758, σ = 0.0291). Algorithms demonstrated superior performance on cases clinically defined as extreme categories of disease activity (Remission and High) compared to those defined as intermediate categories (Moderate and Low) and included laboratory data on inflammatory markers. CONCLUSION: Automatic Rheumatoid Arthritis disease activity discovery from Electronic Medical Record data is a learnable task approximating human performance. As a result, this approach might have several research applications, such as the identification of patients for genome-wide pharmacogenetic studies that require large sample sizes with precise definitions of disease activity and response to therapies. Public Library of Science 2013-08-16 /pmc/articles/PMC3745469/ /pubmed/23976944 http://dx.doi.org/10.1371/journal.pone.0069932 Text en © 2013 Lin et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Lin, Chen
Karlson, Elizabeth W.
Canhao, Helena
Miller, Timothy A.
Dligach, Dmitriy
Chen, Pei Jun
Perez, Raul Natanael Guzman
Shen, Yuanyan
Weinblatt, Michael E.
Shadick, Nancy A.
Plenge, Robert M.
Savova, Guergana K.
Automatic Prediction of Rheumatoid Arthritis Disease Activity from the Electronic Medical Records
title Automatic Prediction of Rheumatoid Arthritis Disease Activity from the Electronic Medical Records
title_full Automatic Prediction of Rheumatoid Arthritis Disease Activity from the Electronic Medical Records
title_fullStr Automatic Prediction of Rheumatoid Arthritis Disease Activity from the Electronic Medical Records
title_full_unstemmed Automatic Prediction of Rheumatoid Arthritis Disease Activity from the Electronic Medical Records
title_short Automatic Prediction of Rheumatoid Arthritis Disease Activity from the Electronic Medical Records
title_sort automatic prediction of rheumatoid arthritis disease activity from the electronic medical records
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3745469/
https://www.ncbi.nlm.nih.gov/pubmed/23976944
http://dx.doi.org/10.1371/journal.pone.0069932
work_keys_str_mv AT linchen automaticpredictionofrheumatoidarthritisdiseaseactivityfromtheelectronicmedicalrecords
AT karlsonelizabethw automaticpredictionofrheumatoidarthritisdiseaseactivityfromtheelectronicmedicalrecords
AT canhaohelena automaticpredictionofrheumatoidarthritisdiseaseactivityfromtheelectronicmedicalrecords
AT millertimothya automaticpredictionofrheumatoidarthritisdiseaseactivityfromtheelectronicmedicalrecords
AT dligachdmitriy automaticpredictionofrheumatoidarthritisdiseaseactivityfromtheelectronicmedicalrecords
AT chenpeijun automaticpredictionofrheumatoidarthritisdiseaseactivityfromtheelectronicmedicalrecords
AT perezraulnatanaelguzman automaticpredictionofrheumatoidarthritisdiseaseactivityfromtheelectronicmedicalrecords
AT shenyuanyan automaticpredictionofrheumatoidarthritisdiseaseactivityfromtheelectronicmedicalrecords
AT weinblattmichaele automaticpredictionofrheumatoidarthritisdiseaseactivityfromtheelectronicmedicalrecords
AT shadicknancya automaticpredictionofrheumatoidarthritisdiseaseactivityfromtheelectronicmedicalrecords
AT plengerobertm automaticpredictionofrheumatoidarthritisdiseaseactivityfromtheelectronicmedicalrecords
AT savovaguerganak automaticpredictionofrheumatoidarthritisdiseaseactivityfromtheelectronicmedicalrecords