Cargando…
Assessment of Natural Language Processing of Electronic Health Records to Measure Goals-of-Care Discussions as a Clinical Trial Outcome
IMPORTANCE: Many clinical trial outcomes are documented in free-text electronic health records (EHRs), making manual data collection costly and infeasible at scale. Natural language processing (NLP) is a promising approach for measuring such outcomes efficiently, but ignoring NLP-related misclassifi...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Medical Association
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9982698/ https://www.ncbi.nlm.nih.gov/pubmed/36862411 http://dx.doi.org/10.1001/jamanetworkopen.2023.1204 |
_version_ | 1784900381383327744 |
---|---|
author | Lee, Robert Y. Kross, Erin K. Torrence, Janaki Li, Kevin S. Sibley, James Cohen, Trevor Lober, William B. Engelberg, Ruth A. Curtis, J. Randall |
author_facet | Lee, Robert Y. Kross, Erin K. Torrence, Janaki Li, Kevin S. Sibley, James Cohen, Trevor Lober, William B. Engelberg, Ruth A. Curtis, J. Randall |
author_sort | Lee, Robert Y. |
collection | PubMed |
description | IMPORTANCE: Many clinical trial outcomes are documented in free-text electronic health records (EHRs), making manual data collection costly and infeasible at scale. Natural language processing (NLP) is a promising approach for measuring such outcomes efficiently, but ignoring NLP-related misclassification may lead to underpowered studies. OBJECTIVE: To evaluate the performance, feasibility, and power implications of using NLP to measure the primary outcome of EHR-documented goals-of-care discussions in a pragmatic randomized clinical trial of a communication intervention. DESIGN, SETTING, AND PARTICIPANTS: This diagnostic study compared the performance, feasibility, and power implications of measuring EHR-documented goals-of-care discussions using 3 approaches: (1) deep-learning NLP, (2) NLP-screened human abstraction (manual verification of NLP-positive records), and (3) conventional manual abstraction. The study included hospitalized patients aged 55 years or older with serious illness enrolled between April 23, 2020, and March 26, 2021, in a pragmatic randomized clinical trial of a communication intervention in a multihospital US academic health system. MAIN OUTCOMES AND MEASURES: Main outcomes were natural language processing performance characteristics, human abstractor-hours, and misclassification-adjusted statistical power of methods of measuring clinician-documented goals-of-care discussions. Performance of NLP was evaluated with receiver operating characteristic (ROC) curves and precision-recall (PR) analyses and examined the effects of misclassification on power using mathematical substitution and Monte Carlo simulation. RESULTS: A total of 2512 trial participants (mean [SD] age, 71.7 [10.8] years; 1456 [58%] female) amassed 44 324 clinical notes during 30-day follow-up. In a validation sample of 159 participants, deep-learning NLP trained on a separate training data set identified patients with documented goals-of-care discussions with moderate accuracy (maximal F(1) score, 0.82; area under the ROC curve, 0.924; area under the PR curve, 0.879). Manual abstraction of the outcome from the trial data set would require an estimated 2000 abstractor-hours and would power the trial to detect a risk difference of 5.4% (assuming 33.5% control-arm prevalence, 80% power, and 2-sided α = .05). Measuring the outcome by NLP alone would power the trial to detect a risk difference of 7.6%. Measuring the outcome by NLP-screened human abstraction would require 34.3 abstractor-hours to achieve estimated sensitivity of 92.6% and would power the trial to detect a risk difference of 5.7%. Monte Carlo simulations corroborated misclassification-adjusted power calculations. CONCLUSIONS AND RELEVANCE: In this diagnostic study, deep-learning NLP and NLP-screened human abstraction had favorable characteristics for measuring an EHR outcome at scale. Adjusted power calculations accurately quantified power loss from NLP-related misclassification, suggesting that incorporation of this approach into the design of studies using NLP would be beneficial. |
format | Online Article Text |
id | pubmed-9982698 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | American Medical Association |
record_format | MEDLINE/PubMed |
spelling | pubmed-99826982023-03-04 Assessment of Natural Language Processing of Electronic Health Records to Measure Goals-of-Care Discussions as a Clinical Trial Outcome Lee, Robert Y. Kross, Erin K. Torrence, Janaki Li, Kevin S. Sibley, James Cohen, Trevor Lober, William B. Engelberg, Ruth A. Curtis, J. Randall JAMA Netw Open Original Investigation IMPORTANCE: Many clinical trial outcomes are documented in free-text electronic health records (EHRs), making manual data collection costly and infeasible at scale. Natural language processing (NLP) is a promising approach for measuring such outcomes efficiently, but ignoring NLP-related misclassification may lead to underpowered studies. OBJECTIVE: To evaluate the performance, feasibility, and power implications of using NLP to measure the primary outcome of EHR-documented goals-of-care discussions in a pragmatic randomized clinical trial of a communication intervention. DESIGN, SETTING, AND PARTICIPANTS: This diagnostic study compared the performance, feasibility, and power implications of measuring EHR-documented goals-of-care discussions using 3 approaches: (1) deep-learning NLP, (2) NLP-screened human abstraction (manual verification of NLP-positive records), and (3) conventional manual abstraction. The study included hospitalized patients aged 55 years or older with serious illness enrolled between April 23, 2020, and March 26, 2021, in a pragmatic randomized clinical trial of a communication intervention in a multihospital US academic health system. MAIN OUTCOMES AND MEASURES: Main outcomes were natural language processing performance characteristics, human abstractor-hours, and misclassification-adjusted statistical power of methods of measuring clinician-documented goals-of-care discussions. Performance of NLP was evaluated with receiver operating characteristic (ROC) curves and precision-recall (PR) analyses and examined the effects of misclassification on power using mathematical substitution and Monte Carlo simulation. RESULTS: A total of 2512 trial participants (mean [SD] age, 71.7 [10.8] years; 1456 [58%] female) amassed 44 324 clinical notes during 30-day follow-up. In a validation sample of 159 participants, deep-learning NLP trained on a separate training data set identified patients with documented goals-of-care discussions with moderate accuracy (maximal F(1) score, 0.82; area under the ROC curve, 0.924; area under the PR curve, 0.879). Manual abstraction of the outcome from the trial data set would require an estimated 2000 abstractor-hours and would power the trial to detect a risk difference of 5.4% (assuming 33.5% control-arm prevalence, 80% power, and 2-sided α = .05). Measuring the outcome by NLP alone would power the trial to detect a risk difference of 7.6%. Measuring the outcome by NLP-screened human abstraction would require 34.3 abstractor-hours to achieve estimated sensitivity of 92.6% and would power the trial to detect a risk difference of 5.7%. Monte Carlo simulations corroborated misclassification-adjusted power calculations. CONCLUSIONS AND RELEVANCE: In this diagnostic study, deep-learning NLP and NLP-screened human abstraction had favorable characteristics for measuring an EHR outcome at scale. Adjusted power calculations accurately quantified power loss from NLP-related misclassification, suggesting that incorporation of this approach into the design of studies using NLP would be beneficial. American Medical Association 2023-03-02 /pmc/articles/PMC9982698/ /pubmed/36862411 http://dx.doi.org/10.1001/jamanetworkopen.2023.1204 Text en Copyright 2023 Lee RY et al. JAMA Network Open. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the CC-BY License. |
spellingShingle | Original Investigation Lee, Robert Y. Kross, Erin K. Torrence, Janaki Li, Kevin S. Sibley, James Cohen, Trevor Lober, William B. Engelberg, Ruth A. Curtis, J. Randall Assessment of Natural Language Processing of Electronic Health Records to Measure Goals-of-Care Discussions as a Clinical Trial Outcome |
title | Assessment of Natural Language Processing of Electronic Health Records to Measure Goals-of-Care Discussions as a Clinical Trial Outcome |
title_full | Assessment of Natural Language Processing of Electronic Health Records to Measure Goals-of-Care Discussions as a Clinical Trial Outcome |
title_fullStr | Assessment of Natural Language Processing of Electronic Health Records to Measure Goals-of-Care Discussions as a Clinical Trial Outcome |
title_full_unstemmed | Assessment of Natural Language Processing of Electronic Health Records to Measure Goals-of-Care Discussions as a Clinical Trial Outcome |
title_short | Assessment of Natural Language Processing of Electronic Health Records to Measure Goals-of-Care Discussions as a Clinical Trial Outcome |
title_sort | assessment of natural language processing of electronic health records to measure goals-of-care discussions as a clinical trial outcome |
topic | Original Investigation |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9982698/ https://www.ncbi.nlm.nih.gov/pubmed/36862411 http://dx.doi.org/10.1001/jamanetworkopen.2023.1204 |
work_keys_str_mv | AT leeroberty assessmentofnaturallanguageprocessingofelectronichealthrecordstomeasuregoalsofcarediscussionsasaclinicaltrialoutcome AT krosserink assessmentofnaturallanguageprocessingofelectronichealthrecordstomeasuregoalsofcarediscussionsasaclinicaltrialoutcome AT torrencejanaki assessmentofnaturallanguageprocessingofelectronichealthrecordstomeasuregoalsofcarediscussionsasaclinicaltrialoutcome AT likevins assessmentofnaturallanguageprocessingofelectronichealthrecordstomeasuregoalsofcarediscussionsasaclinicaltrialoutcome AT sibleyjames assessmentofnaturallanguageprocessingofelectronichealthrecordstomeasuregoalsofcarediscussionsasaclinicaltrialoutcome AT cohentrevor assessmentofnaturallanguageprocessingofelectronichealthrecordstomeasuregoalsofcarediscussionsasaclinicaltrialoutcome AT loberwilliamb assessmentofnaturallanguageprocessingofelectronichealthrecordstomeasuregoalsofcarediscussionsasaclinicaltrialoutcome AT engelbergrutha assessmentofnaturallanguageprocessingofelectronichealthrecordstomeasuregoalsofcarediscussionsasaclinicaltrialoutcome AT curtisjrandall assessmentofnaturallanguageprocessingofelectronichealthrecordstomeasuregoalsofcarediscussionsasaclinicaltrialoutcome |