Cargando…

Assessment of Natural Language Processing of Electronic Health Records to Measure Goals-of-Care Discussions as a Clinical Trial Outcome

IMPORTANCE: Many clinical trial outcomes are documented in free-text electronic health records (EHRs), making manual data collection costly and infeasible at scale. Natural language processing (NLP) is a promising approach for measuring such outcomes efficiently, but ignoring NLP-related misclassifi...

Descripción completa

Detalles Bibliográficos
Autores principales: Lee, Robert Y., Kross, Erin K., Torrence, Janaki, Li, Kevin S., Sibley, James, Cohen, Trevor, Lober, William B., Engelberg, Ruth A., Curtis, J. Randall
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Medical Association 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9982698/
https://www.ncbi.nlm.nih.gov/pubmed/36862411
http://dx.doi.org/10.1001/jamanetworkopen.2023.1204
_version_ 1784900381383327744
author Lee, Robert Y.
Kross, Erin K.
Torrence, Janaki
Li, Kevin S.
Sibley, James
Cohen, Trevor
Lober, William B.
Engelberg, Ruth A.
Curtis, J. Randall
author_facet Lee, Robert Y.
Kross, Erin K.
Torrence, Janaki
Li, Kevin S.
Sibley, James
Cohen, Trevor
Lober, William B.
Engelberg, Ruth A.
Curtis, J. Randall
author_sort Lee, Robert Y.
collection PubMed
description IMPORTANCE: Many clinical trial outcomes are documented in free-text electronic health records (EHRs), making manual data collection costly and infeasible at scale. Natural language processing (NLP) is a promising approach for measuring such outcomes efficiently, but ignoring NLP-related misclassification may lead to underpowered studies. OBJECTIVE: To evaluate the performance, feasibility, and power implications of using NLP to measure the primary outcome of EHR-documented goals-of-care discussions in a pragmatic randomized clinical trial of a communication intervention. DESIGN, SETTING, AND PARTICIPANTS: This diagnostic study compared the performance, feasibility, and power implications of measuring EHR-documented goals-of-care discussions using 3 approaches: (1) deep-learning NLP, (2) NLP-screened human abstraction (manual verification of NLP-positive records), and (3) conventional manual abstraction. The study included hospitalized patients aged 55 years or older with serious illness enrolled between April 23, 2020, and March 26, 2021, in a pragmatic randomized clinical trial of a communication intervention in a multihospital US academic health system. MAIN OUTCOMES AND MEASURES: Main outcomes were natural language processing performance characteristics, human abstractor-hours, and misclassification-adjusted statistical power of methods of measuring clinician-documented goals-of-care discussions. Performance of NLP was evaluated with receiver operating characteristic (ROC) curves and precision-recall (PR) analyses and examined the effects of misclassification on power using mathematical substitution and Monte Carlo simulation. RESULTS: A total of 2512 trial participants (mean [SD] age, 71.7 [10.8] years; 1456 [58%] female) amassed 44 324 clinical notes during 30-day follow-up. In a validation sample of 159 participants, deep-learning NLP trained on a separate training data set identified patients with documented goals-of-care discussions with moderate accuracy (maximal F(1) score, 0.82; area under the ROC curve, 0.924; area under the PR curve, 0.879). Manual abstraction of the outcome from the trial data set would require an estimated 2000 abstractor-hours and would power the trial to detect a risk difference of 5.4% (assuming 33.5% control-arm prevalence, 80% power, and 2-sided α = .05). Measuring the outcome by NLP alone would power the trial to detect a risk difference of 7.6%. Measuring the outcome by NLP-screened human abstraction would require 34.3 abstractor-hours to achieve estimated sensitivity of 92.6% and would power the trial to detect a risk difference of 5.7%. Monte Carlo simulations corroborated misclassification-adjusted power calculations. CONCLUSIONS AND RELEVANCE: In this diagnostic study, deep-learning NLP and NLP-screened human abstraction had favorable characteristics for measuring an EHR outcome at scale. Adjusted power calculations accurately quantified power loss from NLP-related misclassification, suggesting that incorporation of this approach into the design of studies using NLP would be beneficial.
format Online
Article
Text
id pubmed-9982698
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher American Medical Association
record_format MEDLINE/PubMed
spelling pubmed-99826982023-03-04 Assessment of Natural Language Processing of Electronic Health Records to Measure Goals-of-Care Discussions as a Clinical Trial Outcome Lee, Robert Y. Kross, Erin K. Torrence, Janaki Li, Kevin S. Sibley, James Cohen, Trevor Lober, William B. Engelberg, Ruth A. Curtis, J. Randall JAMA Netw Open Original Investigation IMPORTANCE: Many clinical trial outcomes are documented in free-text electronic health records (EHRs), making manual data collection costly and infeasible at scale. Natural language processing (NLP) is a promising approach for measuring such outcomes efficiently, but ignoring NLP-related misclassification may lead to underpowered studies. OBJECTIVE: To evaluate the performance, feasibility, and power implications of using NLP to measure the primary outcome of EHR-documented goals-of-care discussions in a pragmatic randomized clinical trial of a communication intervention. DESIGN, SETTING, AND PARTICIPANTS: This diagnostic study compared the performance, feasibility, and power implications of measuring EHR-documented goals-of-care discussions using 3 approaches: (1) deep-learning NLP, (2) NLP-screened human abstraction (manual verification of NLP-positive records), and (3) conventional manual abstraction. The study included hospitalized patients aged 55 years or older with serious illness enrolled between April 23, 2020, and March 26, 2021, in a pragmatic randomized clinical trial of a communication intervention in a multihospital US academic health system. MAIN OUTCOMES AND MEASURES: Main outcomes were natural language processing performance characteristics, human abstractor-hours, and misclassification-adjusted statistical power of methods of measuring clinician-documented goals-of-care discussions. Performance of NLP was evaluated with receiver operating characteristic (ROC) curves and precision-recall (PR) analyses and examined the effects of misclassification on power using mathematical substitution and Monte Carlo simulation. RESULTS: A total of 2512 trial participants (mean [SD] age, 71.7 [10.8] years; 1456 [58%] female) amassed 44 324 clinical notes during 30-day follow-up. In a validation sample of 159 participants, deep-learning NLP trained on a separate training data set identified patients with documented goals-of-care discussions with moderate accuracy (maximal F(1) score, 0.82; area under the ROC curve, 0.924; area under the PR curve, 0.879). Manual abstraction of the outcome from the trial data set would require an estimated 2000 abstractor-hours and would power the trial to detect a risk difference of 5.4% (assuming 33.5% control-arm prevalence, 80% power, and 2-sided α = .05). Measuring the outcome by NLP alone would power the trial to detect a risk difference of 7.6%. Measuring the outcome by NLP-screened human abstraction would require 34.3 abstractor-hours to achieve estimated sensitivity of 92.6% and would power the trial to detect a risk difference of 5.7%. Monte Carlo simulations corroborated misclassification-adjusted power calculations. CONCLUSIONS AND RELEVANCE: In this diagnostic study, deep-learning NLP and NLP-screened human abstraction had favorable characteristics for measuring an EHR outcome at scale. Adjusted power calculations accurately quantified power loss from NLP-related misclassification, suggesting that incorporation of this approach into the design of studies using NLP would be beneficial. American Medical Association 2023-03-02 /pmc/articles/PMC9982698/ /pubmed/36862411 http://dx.doi.org/10.1001/jamanetworkopen.2023.1204 Text en Copyright 2023 Lee RY et al. JAMA Network Open. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the CC-BY License.
spellingShingle Original Investigation
Lee, Robert Y.
Kross, Erin K.
Torrence, Janaki
Li, Kevin S.
Sibley, James
Cohen, Trevor
Lober, William B.
Engelberg, Ruth A.
Curtis, J. Randall
Assessment of Natural Language Processing of Electronic Health Records to Measure Goals-of-Care Discussions as a Clinical Trial Outcome
title Assessment of Natural Language Processing of Electronic Health Records to Measure Goals-of-Care Discussions as a Clinical Trial Outcome
title_full Assessment of Natural Language Processing of Electronic Health Records to Measure Goals-of-Care Discussions as a Clinical Trial Outcome
title_fullStr Assessment of Natural Language Processing of Electronic Health Records to Measure Goals-of-Care Discussions as a Clinical Trial Outcome
title_full_unstemmed Assessment of Natural Language Processing of Electronic Health Records to Measure Goals-of-Care Discussions as a Clinical Trial Outcome
title_short Assessment of Natural Language Processing of Electronic Health Records to Measure Goals-of-Care Discussions as a Clinical Trial Outcome
title_sort assessment of natural language processing of electronic health records to measure goals-of-care discussions as a clinical trial outcome
topic Original Investigation
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9982698/
https://www.ncbi.nlm.nih.gov/pubmed/36862411
http://dx.doi.org/10.1001/jamanetworkopen.2023.1204
work_keys_str_mv AT leeroberty assessmentofnaturallanguageprocessingofelectronichealthrecordstomeasuregoalsofcarediscussionsasaclinicaltrialoutcome
AT krosserink assessmentofnaturallanguageprocessingofelectronichealthrecordstomeasuregoalsofcarediscussionsasaclinicaltrialoutcome
AT torrencejanaki assessmentofnaturallanguageprocessingofelectronichealthrecordstomeasuregoalsofcarediscussionsasaclinicaltrialoutcome
AT likevins assessmentofnaturallanguageprocessingofelectronichealthrecordstomeasuregoalsofcarediscussionsasaclinicaltrialoutcome
AT sibleyjames assessmentofnaturallanguageprocessingofelectronichealthrecordstomeasuregoalsofcarediscussionsasaclinicaltrialoutcome
AT cohentrevor assessmentofnaturallanguageprocessingofelectronichealthrecordstomeasuregoalsofcarediscussionsasaclinicaltrialoutcome
AT loberwilliamb assessmentofnaturallanguageprocessingofelectronichealthrecordstomeasuregoalsofcarediscussionsasaclinicaltrialoutcome
AT engelbergrutha assessmentofnaturallanguageprocessingofelectronichealthrecordstomeasuregoalsofcarediscussionsasaclinicaltrialoutcome
AT curtisjrandall assessmentofnaturallanguageprocessingofelectronichealthrecordstomeasuregoalsofcarediscussionsasaclinicaltrialoutcome