Cargando…
Automating Access to Real-World Evidence
INTRODUCTION: Real-world evidence is important in regulatory and funding decisions. Manual data extraction from electronic health records (EHRs) is time-consuming and challenging to maintain. Automated extraction using natural language processing (NLP) and artificial intelligence may facilitate this...
Autores principales: | , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9201015/ https://www.ncbi.nlm.nih.gov/pubmed/35719866 http://dx.doi.org/10.1016/j.jtocrr.2022.100340 |
_version_ | 1784728193232535552 |
---|---|
author | Gauthier, Marie-Pier Law, Jennifer H. Le, Lisa W. Li, Janice J.N. Zahir, Sajda Nirmalakumar, Sharon Sung, Mike Pettengell, Christopher Aviv, Steven Chu, Ryan Sacher, Adrian Liu, Geoffrey Bradbury, Penelope Shepherd, Frances A. Leighl, Natasha B. |
author_facet | Gauthier, Marie-Pier Law, Jennifer H. Le, Lisa W. Li, Janice J.N. Zahir, Sajda Nirmalakumar, Sharon Sung, Mike Pettengell, Christopher Aviv, Steven Chu, Ryan Sacher, Adrian Liu, Geoffrey Bradbury, Penelope Shepherd, Frances A. Leighl, Natasha B. |
author_sort | Gauthier, Marie-Pier |
collection | PubMed |
description | INTRODUCTION: Real-world evidence is important in regulatory and funding decisions. Manual data extraction from electronic health records (EHRs) is time-consuming and challenging to maintain. Automated extraction using natural language processing (NLP) and artificial intelligence may facilitate this process. Whereas NLP offers a faster solution than manual methods of extraction, the validity of extracted data remains in question. The current study compared manual and automated data extraction from the EHR of patients with advanced lung cancer. METHODS: Previously, we extracted EHRs from 1209 patients diagnosed with advanced lung cancer (stage IIIB or IV) between January 2015 and December 2017 at Princess Margaret Cancer Centre (Toronto, Canada) using the commercially available artificial intelligence engine, DARWEN (Pentavere, Ontario, Canada). For comparison, 100 of 333 patients that received systemic therapy were randomly selected and clinical data manually extracted by two trained abstractors using the same accepted gold standard feature definitions, including patient, disease characteristics, and treatment data. All cases were re-reviewed by an expert adjudicator. Accuracy and concordance between automated and manual methods are reported. RESULTS: Automated extraction required considerably less time (<1 day) than manual extraction (∼225 person-hr). The collection of demographic data (age, sex, diagnosis) was highly accurate and concordant with both methods (96%–100%). Accuracy (for either extraction approach) and concordance were lower for unstructured data elements in EHR, such as performance status, date of diagnosis, and smoking status (NLP accuracy: 88%–94%; Manual accuracy: 78%–94%; concordance: 71%–82%). Concurrent medications (86%–100%) and comorbid conditions (96%–100%), were reported with high accuracy and concordance. Treatment details were also accurately captured with both methods (84%–100%) and highly concordant (83%–99%). Detection of whether biomarker testing was performed was highly accurate and concordant (96%–98%), although detection of biomarker test results was more variable (accuracy 84%–100%, concordance 84%–99%). Features with syntactic or semantic variation requiring clinical interpretation were extracted with slightly lower accuracy by both NLP and manual review. For example, metastatic sites were more accurately identified through NLP extraction (NLP: 88%–99%; manual: 71%–100%; concordance: 70%–99%) with the exception of lung and lymph node metastases (NLP: 66%–71%; manual: 87%–92%; concordance: 58%) owing to analogous terms used in radiology reports not being included in the accepted gold standard definition. CONCLUSIONS: Automated data abstraction from EHR is highly accurate and faster than manual abstraction. Key challenges include poorly structured EHR and the use of analogous terms beyond the accepted gold standard definition. The application of NLP can facilitate real-world evidence studies at a greater scale than could be achieved with manual data extraction |
format | Online Article Text |
id | pubmed-9201015 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Elsevier |
record_format | MEDLINE/PubMed |
spelling | pubmed-92010152022-06-17 Automating Access to Real-World Evidence Gauthier, Marie-Pier Law, Jennifer H. Le, Lisa W. Li, Janice J.N. Zahir, Sajda Nirmalakumar, Sharon Sung, Mike Pettengell, Christopher Aviv, Steven Chu, Ryan Sacher, Adrian Liu, Geoffrey Bradbury, Penelope Shepherd, Frances A. Leighl, Natasha B. JTO Clin Res Rep Original Article INTRODUCTION: Real-world evidence is important in regulatory and funding decisions. Manual data extraction from electronic health records (EHRs) is time-consuming and challenging to maintain. Automated extraction using natural language processing (NLP) and artificial intelligence may facilitate this process. Whereas NLP offers a faster solution than manual methods of extraction, the validity of extracted data remains in question. The current study compared manual and automated data extraction from the EHR of patients with advanced lung cancer. METHODS: Previously, we extracted EHRs from 1209 patients diagnosed with advanced lung cancer (stage IIIB or IV) between January 2015 and December 2017 at Princess Margaret Cancer Centre (Toronto, Canada) using the commercially available artificial intelligence engine, DARWEN (Pentavere, Ontario, Canada). For comparison, 100 of 333 patients that received systemic therapy were randomly selected and clinical data manually extracted by two trained abstractors using the same accepted gold standard feature definitions, including patient, disease characteristics, and treatment data. All cases were re-reviewed by an expert adjudicator. Accuracy and concordance between automated and manual methods are reported. RESULTS: Automated extraction required considerably less time (<1 day) than manual extraction (∼225 person-hr). The collection of demographic data (age, sex, diagnosis) was highly accurate and concordant with both methods (96%–100%). Accuracy (for either extraction approach) and concordance were lower for unstructured data elements in EHR, such as performance status, date of diagnosis, and smoking status (NLP accuracy: 88%–94%; Manual accuracy: 78%–94%; concordance: 71%–82%). Concurrent medications (86%–100%) and comorbid conditions (96%–100%), were reported with high accuracy and concordance. Treatment details were also accurately captured with both methods (84%–100%) and highly concordant (83%–99%). Detection of whether biomarker testing was performed was highly accurate and concordant (96%–98%), although detection of biomarker test results was more variable (accuracy 84%–100%, concordance 84%–99%). Features with syntactic or semantic variation requiring clinical interpretation were extracted with slightly lower accuracy by both NLP and manual review. For example, metastatic sites were more accurately identified through NLP extraction (NLP: 88%–99%; manual: 71%–100%; concordance: 70%–99%) with the exception of lung and lymph node metastases (NLP: 66%–71%; manual: 87%–92%; concordance: 58%) owing to analogous terms used in radiology reports not being included in the accepted gold standard definition. CONCLUSIONS: Automated data abstraction from EHR is highly accurate and faster than manual abstraction. Key challenges include poorly structured EHR and the use of analogous terms beyond the accepted gold standard definition. The application of NLP can facilitate real-world evidence studies at a greater scale than could be achieved with manual data extraction Elsevier 2022-05-17 /pmc/articles/PMC9201015/ /pubmed/35719866 http://dx.doi.org/10.1016/j.jtocrr.2022.100340 Text en © 2022 The Authors https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). |
spellingShingle | Original Article Gauthier, Marie-Pier Law, Jennifer H. Le, Lisa W. Li, Janice J.N. Zahir, Sajda Nirmalakumar, Sharon Sung, Mike Pettengell, Christopher Aviv, Steven Chu, Ryan Sacher, Adrian Liu, Geoffrey Bradbury, Penelope Shepherd, Frances A. Leighl, Natasha B. Automating Access to Real-World Evidence |
title | Automating Access to Real-World Evidence |
title_full | Automating Access to Real-World Evidence |
title_fullStr | Automating Access to Real-World Evidence |
title_full_unstemmed | Automating Access to Real-World Evidence |
title_short | Automating Access to Real-World Evidence |
title_sort | automating access to real-world evidence |
topic | Original Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9201015/ https://www.ncbi.nlm.nih.gov/pubmed/35719866 http://dx.doi.org/10.1016/j.jtocrr.2022.100340 |
work_keys_str_mv | AT gauthiermariepier automatingaccesstorealworldevidence AT lawjenniferh automatingaccesstorealworldevidence AT lelisaw automatingaccesstorealworldevidence AT lijanicejn automatingaccesstorealworldevidence AT zahirsajda automatingaccesstorealworldevidence AT nirmalakumarsharon automatingaccesstorealworldevidence AT sungmike automatingaccesstorealworldevidence AT pettengellchristopher automatingaccesstorealworldevidence AT avivsteven automatingaccesstorealworldevidence AT churyan automatingaccesstorealworldevidence AT sacheradrian automatingaccesstorealworldevidence AT liugeoffrey automatingaccesstorealworldevidence AT bradburypenelope automatingaccesstorealworldevidence AT shepherdfrancesa automatingaccesstorealworldevidence AT leighlnatashab automatingaccesstorealworldevidence |