Cargando…

Validation of a machine learning approach to estimate Systemic Lupus Erythematosus Disease Activity Index score categories and application in a real-world dataset

OBJECTIVE: Use of the Systemic Lupus Erythematosus Disease Activity Index (SLEDAI) in routine clinical practice is inconsistent, and availability of clinician-recorded SLEDAI scores in real-world datasets is limited. This study aimed to validate a machine learning model to estimate SLEDAI score cate...

Descripción completa

Detalles Bibliográficos
Autores principales: Alves, Pedro, Bandaria, Jigar, Leavy, Michelle B, Gliklich, Benjamin, Boussios, Costas, Su, Zhaohui, Curhan, Gary
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BMJ Publishing Group 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8141448/
https://www.ncbi.nlm.nih.gov/pubmed/34016712
http://dx.doi.org/10.1136/rmdopen-2021-001586
_version_ 1783696365788856320
author Alves, Pedro
Bandaria, Jigar
Leavy, Michelle B
Gliklich, Benjamin
Boussios, Costas
Su, Zhaohui
Curhan, Gary
author_facet Alves, Pedro
Bandaria, Jigar
Leavy, Michelle B
Gliklich, Benjamin
Boussios, Costas
Su, Zhaohui
Curhan, Gary
author_sort Alves, Pedro
collection PubMed
description OBJECTIVE: Use of the Systemic Lupus Erythematosus Disease Activity Index (SLEDAI) in routine clinical practice is inconsistent, and availability of clinician-recorded SLEDAI scores in real-world datasets is limited. This study aimed to validate a machine learning model to estimate SLEDAI score categories using clinical notes and to apply the model to a large, real-world dataset to generate estimated score categories for use in future research studies. METHODS: A machine learning model was developed to estimate an individual patient’s SLEDAI score category (no activity, mild activity, moderate activity or high/very high activity) for a specific encounter date using clinical notes. A training cohort of 3504 encounters and a separate validation cohort of 1576 encounters were created from the OM1 SLE Registry. Model performance was assessed using the area under the receiver operating characteristic curve (AUC), calculated using a binarised version of the outcome that sets the positive class to be those records with clinician-recorded SLEDAI scores >5 and the negative class to be records with scores ≤5. Model performance was evaluated by categorising the scores into the four disease activity categories and by calculating the Spearman’s R value and Pearson’s R value. RESULTS: The AUC for the two categories was 0.93 for the development cohort and 0.91 for the validation cohort. The model had a Spearman’s R value of 0.7 and a Pearson’s R value of 0.7 when calculated using the four disease activity categories. CONCLUSION: The model performs well when estimating SLEDAI score categories using unstructured clinical notes.
format Online
Article
Text
id pubmed-8141448
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BMJ Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-81414482021-06-07 Validation of a machine learning approach to estimate Systemic Lupus Erythematosus Disease Activity Index score categories and application in a real-world dataset Alves, Pedro Bandaria, Jigar Leavy, Michelle B Gliklich, Benjamin Boussios, Costas Su, Zhaohui Curhan, Gary RMD Open Lupus OBJECTIVE: Use of the Systemic Lupus Erythematosus Disease Activity Index (SLEDAI) in routine clinical practice is inconsistent, and availability of clinician-recorded SLEDAI scores in real-world datasets is limited. This study aimed to validate a machine learning model to estimate SLEDAI score categories using clinical notes and to apply the model to a large, real-world dataset to generate estimated score categories for use in future research studies. METHODS: A machine learning model was developed to estimate an individual patient’s SLEDAI score category (no activity, mild activity, moderate activity or high/very high activity) for a specific encounter date using clinical notes. A training cohort of 3504 encounters and a separate validation cohort of 1576 encounters were created from the OM1 SLE Registry. Model performance was assessed using the area under the receiver operating characteristic curve (AUC), calculated using a binarised version of the outcome that sets the positive class to be those records with clinician-recorded SLEDAI scores >5 and the negative class to be records with scores ≤5. Model performance was evaluated by categorising the scores into the four disease activity categories and by calculating the Spearman’s R value and Pearson’s R value. RESULTS: The AUC for the two categories was 0.93 for the development cohort and 0.91 for the validation cohort. The model had a Spearman’s R value of 0.7 and a Pearson’s R value of 0.7 when calculated using the four disease activity categories. CONCLUSION: The model performs well when estimating SLEDAI score categories using unstructured clinical notes. BMJ Publishing Group 2021-05-20 /pmc/articles/PMC8141448/ /pubmed/34016712 http://dx.doi.org/10.1136/rmdopen-2021-001586 Text en © Author(s) (or their employer(s)) 2021. Re-use permitted under CC BY-NC. No commercial re-use. See rights and permissions. Published by BMJ. https://creativecommons.org/licenses/by-nc/4.0/This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) .
spellingShingle Lupus
Alves, Pedro
Bandaria, Jigar
Leavy, Michelle B
Gliklich, Benjamin
Boussios, Costas
Su, Zhaohui
Curhan, Gary
Validation of a machine learning approach to estimate Systemic Lupus Erythematosus Disease Activity Index score categories and application in a real-world dataset
title Validation of a machine learning approach to estimate Systemic Lupus Erythematosus Disease Activity Index score categories and application in a real-world dataset
title_full Validation of a machine learning approach to estimate Systemic Lupus Erythematosus Disease Activity Index score categories and application in a real-world dataset
title_fullStr Validation of a machine learning approach to estimate Systemic Lupus Erythematosus Disease Activity Index score categories and application in a real-world dataset
title_full_unstemmed Validation of a machine learning approach to estimate Systemic Lupus Erythematosus Disease Activity Index score categories and application in a real-world dataset
title_short Validation of a machine learning approach to estimate Systemic Lupus Erythematosus Disease Activity Index score categories and application in a real-world dataset
title_sort validation of a machine learning approach to estimate systemic lupus erythematosus disease activity index score categories and application in a real-world dataset
topic Lupus
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8141448/
https://www.ncbi.nlm.nih.gov/pubmed/34016712
http://dx.doi.org/10.1136/rmdopen-2021-001586
work_keys_str_mv AT alvespedro validationofamachinelearningapproachtoestimatesystemiclupuserythematosusdiseaseactivityindexscorecategoriesandapplicationinarealworlddataset
AT bandariajigar validationofamachinelearningapproachtoestimatesystemiclupuserythematosusdiseaseactivityindexscorecategoriesandapplicationinarealworlddataset
AT leavymichelleb validationofamachinelearningapproachtoestimatesystemiclupuserythematosusdiseaseactivityindexscorecategoriesandapplicationinarealworlddataset
AT gliklichbenjamin validationofamachinelearningapproachtoestimatesystemiclupuserythematosusdiseaseactivityindexscorecategoriesandapplicationinarealworlddataset
AT boussioscostas validationofamachinelearningapproachtoestimatesystemiclupuserythematosusdiseaseactivityindexscorecategoriesandapplicationinarealworlddataset
AT suzhaohui validationofamachinelearningapproachtoestimatesystemiclupuserythematosusdiseaseactivityindexscorecategoriesandapplicationinarealworlddataset
AT curhangary validationofamachinelearningapproachtoestimatesystemiclupuserythematosusdiseaseactivityindexscorecategoriesandapplicationinarealworlddataset