Cargando…

Rule-based and machine learning algorithms identify patients with systemic sclerosis accurately in the electronic health record

BACKGROUND: Systemic sclerosis (SSc) is a rare disease with studies limited by small sample sizes. Electronic health records (EHRs) represent a powerful tool to study patients with rare diseases such as SSc, but validated methods are needed. We developed and validated EHR-based algorithms that incor...

Descripción completa

Detalles Bibliográficos
Autores principales: Jamian, Lia, Wheless, Lee, Crofford, Leslie J., Barnado, April
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6937803/
https://www.ncbi.nlm.nih.gov/pubmed/31888720
http://dx.doi.org/10.1186/s13075-019-2092-7
_version_ 1783483938983903232
author Jamian, Lia
Wheless, Lee
Crofford, Leslie J.
Barnado, April
author_facet Jamian, Lia
Wheless, Lee
Crofford, Leslie J.
Barnado, April
author_sort Jamian, Lia
collection PubMed
description BACKGROUND: Systemic sclerosis (SSc) is a rare disease with studies limited by small sample sizes. Electronic health records (EHRs) represent a powerful tool to study patients with rare diseases such as SSc, but validated methods are needed. We developed and validated EHR-based algorithms that incorporate billing codes and clinical data to identify SSc patients in the EHR. METHODS: We used a de-identified EHR with over 3 million subjects and identified 1899 potential SSc subjects with at least 1 count of the SSc ICD-9 (710.1) or ICD-10-CM (M34*) codes. We randomly selected 200 as a training set for chart review. A subject was a case if diagnosed with SSc by a rheumatologist, dermatologist, or pulmonologist. We selected the following algorithm components based on clinical knowledge and available data: SSc ICD-9 and ICD-10-CM codes, positive antinuclear antibody (ANA) (titer ≥ 1:80), and a keyword of Raynaud’s phenomenon (RP). We performed both rule-based and machine learning techniques for algorithm development. Positive predictive values (PPVs), sensitivities, and F-scores (which account for PPVs and sensitivities) were calculated for the algorithms. RESULTS: PPVs were low for algorithms using only 1 count of the SSc ICD-9 code. As code counts increased, the PPVs increased. PPVs were higher for algorithms using ICD-10-CM codes versus the ICD-9 code. Adding a positive ANA and RP keyword increased the PPVs of algorithms only using ICD billing codes. Algorithms using ≥ 3 or ≥ 4 counts of the SSc ICD-9 or ICD-10-CM codes and ANA positivity had the highest PPV at 100% but a low sensitivity at 50%. The algorithm with the highest F-score of 91% was ≥ 4 counts of the ICD-9 or ICD-10-CM codes with an internally validated PPV of 90%. A machine learning method using random forests yielded an algorithm with a PPV of 84%, sensitivity of 92%, and F-score of 88%. The most important feature was RP keyword. CONCLUSIONS: Algorithms using only ICD-9 codes did not perform well to identify SSc patients. The highest performing algorithms incorporated clinical data with billing codes. EHR-based algorithms can identify SSc patients across a healthcare system, enabling researchers to examine important outcomes.
format Online
Article
Text
id pubmed-6937803
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-69378032019-12-31 Rule-based and machine learning algorithms identify patients with systemic sclerosis accurately in the electronic health record Jamian, Lia Wheless, Lee Crofford, Leslie J. Barnado, April Arthritis Res Ther Research Article BACKGROUND: Systemic sclerosis (SSc) is a rare disease with studies limited by small sample sizes. Electronic health records (EHRs) represent a powerful tool to study patients with rare diseases such as SSc, but validated methods are needed. We developed and validated EHR-based algorithms that incorporate billing codes and clinical data to identify SSc patients in the EHR. METHODS: We used a de-identified EHR with over 3 million subjects and identified 1899 potential SSc subjects with at least 1 count of the SSc ICD-9 (710.1) or ICD-10-CM (M34*) codes. We randomly selected 200 as a training set for chart review. A subject was a case if diagnosed with SSc by a rheumatologist, dermatologist, or pulmonologist. We selected the following algorithm components based on clinical knowledge and available data: SSc ICD-9 and ICD-10-CM codes, positive antinuclear antibody (ANA) (titer ≥ 1:80), and a keyword of Raynaud’s phenomenon (RP). We performed both rule-based and machine learning techniques for algorithm development. Positive predictive values (PPVs), sensitivities, and F-scores (which account for PPVs and sensitivities) were calculated for the algorithms. RESULTS: PPVs were low for algorithms using only 1 count of the SSc ICD-9 code. As code counts increased, the PPVs increased. PPVs were higher for algorithms using ICD-10-CM codes versus the ICD-9 code. Adding a positive ANA and RP keyword increased the PPVs of algorithms only using ICD billing codes. Algorithms using ≥ 3 or ≥ 4 counts of the SSc ICD-9 or ICD-10-CM codes and ANA positivity had the highest PPV at 100% but a low sensitivity at 50%. The algorithm with the highest F-score of 91% was ≥ 4 counts of the ICD-9 or ICD-10-CM codes with an internally validated PPV of 90%. A machine learning method using random forests yielded an algorithm with a PPV of 84%, sensitivity of 92%, and F-score of 88%. The most important feature was RP keyword. CONCLUSIONS: Algorithms using only ICD-9 codes did not perform well to identify SSc patients. The highest performing algorithms incorporated clinical data with billing codes. EHR-based algorithms can identify SSc patients across a healthcare system, enabling researchers to examine important outcomes. BioMed Central 2019-12-30 2019 /pmc/articles/PMC6937803/ /pubmed/31888720 http://dx.doi.org/10.1186/s13075-019-2092-7 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Jamian, Lia
Wheless, Lee
Crofford, Leslie J.
Barnado, April
Rule-based and machine learning algorithms identify patients with systemic sclerosis accurately in the electronic health record
title Rule-based and machine learning algorithms identify patients with systemic sclerosis accurately in the electronic health record
title_full Rule-based and machine learning algorithms identify patients with systemic sclerosis accurately in the electronic health record
title_fullStr Rule-based and machine learning algorithms identify patients with systemic sclerosis accurately in the electronic health record
title_full_unstemmed Rule-based and machine learning algorithms identify patients with systemic sclerosis accurately in the electronic health record
title_short Rule-based and machine learning algorithms identify patients with systemic sclerosis accurately in the electronic health record
title_sort rule-based and machine learning algorithms identify patients with systemic sclerosis accurately in the electronic health record
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6937803/
https://www.ncbi.nlm.nih.gov/pubmed/31888720
http://dx.doi.org/10.1186/s13075-019-2092-7
work_keys_str_mv AT jamianlia rulebasedandmachinelearningalgorithmsidentifypatientswithsystemicsclerosisaccuratelyintheelectronichealthrecord
AT whelesslee rulebasedandmachinelearningalgorithmsidentifypatientswithsystemicsclerosisaccuratelyintheelectronichealthrecord
AT croffordlesliej rulebasedandmachinelearningalgorithmsidentifypatientswithsystemicsclerosisaccuratelyintheelectronichealthrecord
AT barnadoapril rulebasedandmachinelearningalgorithmsidentifypatientswithsystemicsclerosisaccuratelyintheelectronichealthrecord