Cargando…

Automating Access to Real-World Evidence

INTRODUCTION: Real-world evidence is important in regulatory and funding decisions. Manual data extraction from electronic health records (EHRs) is time-consuming and challenging to maintain. Automated extraction using natural language processing (NLP) and artificial intelligence may facilitate this...

Descripción completa

Detalles Bibliográficos
Autores principales: Gauthier, Marie-Pier, Law, Jennifer H., Le, Lisa W., Li, Janice J.N., Zahir, Sajda, Nirmalakumar, Sharon, Sung, Mike, Pettengell, Christopher, Aviv, Steven, Chu, Ryan, Sacher, Adrian, Liu, Geoffrey, Bradbury, Penelope, Shepherd, Frances A., Leighl, Natasha B.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9201015/
https://www.ncbi.nlm.nih.gov/pubmed/35719866
http://dx.doi.org/10.1016/j.jtocrr.2022.100340
_version_ 1784728193232535552
author Gauthier, Marie-Pier
Law, Jennifer H.
Le, Lisa W.
Li, Janice J.N.
Zahir, Sajda
Nirmalakumar, Sharon
Sung, Mike
Pettengell, Christopher
Aviv, Steven
Chu, Ryan
Sacher, Adrian
Liu, Geoffrey
Bradbury, Penelope
Shepherd, Frances A.
Leighl, Natasha B.
author_facet Gauthier, Marie-Pier
Law, Jennifer H.
Le, Lisa W.
Li, Janice J.N.
Zahir, Sajda
Nirmalakumar, Sharon
Sung, Mike
Pettengell, Christopher
Aviv, Steven
Chu, Ryan
Sacher, Adrian
Liu, Geoffrey
Bradbury, Penelope
Shepherd, Frances A.
Leighl, Natasha B.
author_sort Gauthier, Marie-Pier
collection PubMed
description INTRODUCTION: Real-world evidence is important in regulatory and funding decisions. Manual data extraction from electronic health records (EHRs) is time-consuming and challenging to maintain. Automated extraction using natural language processing (NLP) and artificial intelligence may facilitate this process. Whereas NLP offers a faster solution than manual methods of extraction, the validity of extracted data remains in question. The current study compared manual and automated data extraction from the EHR of patients with advanced lung cancer. METHODS: Previously, we extracted EHRs from 1209 patients diagnosed with advanced lung cancer (stage IIIB or IV) between January 2015 and December 2017 at Princess Margaret Cancer Centre (Toronto, Canada) using the commercially available artificial intelligence engine, DARWEN (Pentavere, Ontario, Canada). For comparison, 100 of 333 patients that received systemic therapy were randomly selected and clinical data manually extracted by two trained abstractors using the same accepted gold standard feature definitions, including patient, disease characteristics, and treatment data. All cases were re-reviewed by an expert adjudicator. Accuracy and concordance between automated and manual methods are reported. RESULTS: Automated extraction required considerably less time (<1 day) than manual extraction (∼225 person-hr). The collection of demographic data (age, sex, diagnosis) was highly accurate and concordant with both methods (96%–100%). Accuracy (for either extraction approach) and concordance were lower for unstructured data elements in EHR, such as performance status, date of diagnosis, and smoking status (NLP accuracy: 88%–94%; Manual accuracy: 78%–94%; concordance: 71%–82%). Concurrent medications (86%–100%) and comorbid conditions (96%–100%), were reported with high accuracy and concordance. Treatment details were also accurately captured with both methods (84%–100%) and highly concordant (83%–99%). Detection of whether biomarker testing was performed was highly accurate and concordant (96%–98%), although detection of biomarker test results was more variable (accuracy 84%–100%, concordance 84%–99%). Features with syntactic or semantic variation requiring clinical interpretation were extracted with slightly lower accuracy by both NLP and manual review. For example, metastatic sites were more accurately identified through NLP extraction (NLP: 88%–99%; manual: 71%–100%; concordance: 70%–99%) with the exception of lung and lymph node metastases (NLP: 66%–71%; manual: 87%–92%; concordance: 58%) owing to analogous terms used in radiology reports not being included in the accepted gold standard definition. CONCLUSIONS: Automated data abstraction from EHR is highly accurate and faster than manual abstraction. Key challenges include poorly structured EHR and the use of analogous terms beyond the accepted gold standard definition. The application of NLP can facilitate real-world evidence studies at a greater scale than could be achieved with manual data extraction
format Online
Article
Text
id pubmed-9201015
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-92010152022-06-17 Automating Access to Real-World Evidence Gauthier, Marie-Pier Law, Jennifer H. Le, Lisa W. Li, Janice J.N. Zahir, Sajda Nirmalakumar, Sharon Sung, Mike Pettengell, Christopher Aviv, Steven Chu, Ryan Sacher, Adrian Liu, Geoffrey Bradbury, Penelope Shepherd, Frances A. Leighl, Natasha B. JTO Clin Res Rep Original Article INTRODUCTION: Real-world evidence is important in regulatory and funding decisions. Manual data extraction from electronic health records (EHRs) is time-consuming and challenging to maintain. Automated extraction using natural language processing (NLP) and artificial intelligence may facilitate this process. Whereas NLP offers a faster solution than manual methods of extraction, the validity of extracted data remains in question. The current study compared manual and automated data extraction from the EHR of patients with advanced lung cancer. METHODS: Previously, we extracted EHRs from 1209 patients diagnosed with advanced lung cancer (stage IIIB or IV) between January 2015 and December 2017 at Princess Margaret Cancer Centre (Toronto, Canada) using the commercially available artificial intelligence engine, DARWEN (Pentavere, Ontario, Canada). For comparison, 100 of 333 patients that received systemic therapy were randomly selected and clinical data manually extracted by two trained abstractors using the same accepted gold standard feature definitions, including patient, disease characteristics, and treatment data. All cases were re-reviewed by an expert adjudicator. Accuracy and concordance between automated and manual methods are reported. RESULTS: Automated extraction required considerably less time (<1 day) than manual extraction (∼225 person-hr). The collection of demographic data (age, sex, diagnosis) was highly accurate and concordant with both methods (96%–100%). Accuracy (for either extraction approach) and concordance were lower for unstructured data elements in EHR, such as performance status, date of diagnosis, and smoking status (NLP accuracy: 88%–94%; Manual accuracy: 78%–94%; concordance: 71%–82%). Concurrent medications (86%–100%) and comorbid conditions (96%–100%), were reported with high accuracy and concordance. Treatment details were also accurately captured with both methods (84%–100%) and highly concordant (83%–99%). Detection of whether biomarker testing was performed was highly accurate and concordant (96%–98%), although detection of biomarker test results was more variable (accuracy 84%–100%, concordance 84%–99%). Features with syntactic or semantic variation requiring clinical interpretation were extracted with slightly lower accuracy by both NLP and manual review. For example, metastatic sites were more accurately identified through NLP extraction (NLP: 88%–99%; manual: 71%–100%; concordance: 70%–99%) with the exception of lung and lymph node metastases (NLP: 66%–71%; manual: 87%–92%; concordance: 58%) owing to analogous terms used in radiology reports not being included in the accepted gold standard definition. CONCLUSIONS: Automated data abstraction from EHR is highly accurate and faster than manual abstraction. Key challenges include poorly structured EHR and the use of analogous terms beyond the accepted gold standard definition. The application of NLP can facilitate real-world evidence studies at a greater scale than could be achieved with manual data extraction Elsevier 2022-05-17 /pmc/articles/PMC9201015/ /pubmed/35719866 http://dx.doi.org/10.1016/j.jtocrr.2022.100340 Text en © 2022 The Authors https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Original Article
Gauthier, Marie-Pier
Law, Jennifer H.
Le, Lisa W.
Li, Janice J.N.
Zahir, Sajda
Nirmalakumar, Sharon
Sung, Mike
Pettengell, Christopher
Aviv, Steven
Chu, Ryan
Sacher, Adrian
Liu, Geoffrey
Bradbury, Penelope
Shepherd, Frances A.
Leighl, Natasha B.
Automating Access to Real-World Evidence
title Automating Access to Real-World Evidence
title_full Automating Access to Real-World Evidence
title_fullStr Automating Access to Real-World Evidence
title_full_unstemmed Automating Access to Real-World Evidence
title_short Automating Access to Real-World Evidence
title_sort automating access to real-world evidence
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9201015/
https://www.ncbi.nlm.nih.gov/pubmed/35719866
http://dx.doi.org/10.1016/j.jtocrr.2022.100340
work_keys_str_mv AT gauthiermariepier automatingaccesstorealworldevidence
AT lawjenniferh automatingaccesstorealworldevidence
AT lelisaw automatingaccesstorealworldevidence
AT lijanicejn automatingaccesstorealworldevidence
AT zahirsajda automatingaccesstorealworldevidence
AT nirmalakumarsharon automatingaccesstorealworldevidence
AT sungmike automatingaccesstorealworldevidence
AT pettengellchristopher automatingaccesstorealworldevidence
AT avivsteven automatingaccesstorealworldevidence
AT churyan automatingaccesstorealworldevidence
AT sacheradrian automatingaccesstorealworldevidence
AT liugeoffrey automatingaccesstorealworldevidence
AT bradburypenelope automatingaccesstorealworldevidence
AT shepherdfrancesa automatingaccesstorealworldevidence
AT leighlnatashab automatingaccesstorealworldevidence