Cargando…

Replication of Real-World Evidence in Oncology Using Electronic Health Record Data Extracted by Machine Learning

SIMPLE SUMMARY: Obtaining and structuring information about the characteristics, treatments, and outcomes of people living with cancer for research purposes is difficult and resource-intensive. Oftentimes, this information can only be found in electronic health records (EHRs). In response, researche...

Descripción completa

Detalles Bibliográficos
Autores principales: Benedum, Corey M., Sondhi, Arjun, Fidyk, Erin, Cohen, Aaron B., Nemeth, Sheila, Adamson, Blythe, Estévez, Melissa, Bozkurt, Selen
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10046618/
https://www.ncbi.nlm.nih.gov/pubmed/36980739
http://dx.doi.org/10.3390/cancers15061853
_version_ 1785013718815342592
author Benedum, Corey M.
Sondhi, Arjun
Fidyk, Erin
Cohen, Aaron B.
Nemeth, Sheila
Adamson, Blythe
Estévez, Melissa
Bozkurt, Selen
author_facet Benedum, Corey M.
Sondhi, Arjun
Fidyk, Erin
Cohen, Aaron B.
Nemeth, Sheila
Adamson, Blythe
Estévez, Melissa
Bozkurt, Selen
author_sort Benedum, Corey M.
collection PubMed
description SIMPLE SUMMARY: Obtaining and structuring information about the characteristics, treatments, and outcomes of people living with cancer for research purposes is difficult and resource-intensive. Oftentimes, this information can only be found in electronic health records (EHRs). In response, researchers use natural language processing with machine learning (ML extraction) techniques to extract information at scale. This study evaluated the quality and fitness-for-use of EHR-derived oncology data curated using ML extraction, relative to the standard approach, abstraction by trained experts. Using patients with lung cancer from a real-world database, we performed replication analyses demonstrating common analyses conducted in observational research. Eligible patients were selected into biomarker- and treatment-defined cohorts, first with expert-abstracted then with ML-extracted data. The study’s results and conclusions were similar regardless of the data curation method used. These results demonstrate that high-performance ML-extracted variables trained on expert-abstracted data can achieve similar results as when using abstracted data, unlocking the ability to perform oncology research at scale. ABSTRACT: Meaningful real-world evidence (RWE) generation requires unstructured data found in electronic health records (EHRs) which are often missing from administrative claims; however, obtaining relevant data from unstructured EHR sources is resource-intensive. In response, researchers are using natural language processing (NLP) with machine learning (ML) techniques (i.e., ML extraction) to extract real-world data (RWD) at scale. This study assessed the quality and fitness-for-use of EHR-derived oncology data curated using NLP with ML as compared to the reference standard of expert abstraction. Using a sample of 186,313 patients with lung cancer from a nationwide EHR-derived de-identified database, we performed a series of replication analyses demonstrating some common analyses conducted in retrospective observational research with complex EHR-derived data to generate evidence. Eligible patients were selected into biomarker- and treatment-defined cohorts, first with expert-abstracted then with ML-extracted data. We utilized the biomarker- and treatment-defined cohorts to perform analyses related to biomarker-associated survival and treatment comparative effectiveness, respectively. Across all analyses, the results differed by less than 8% between the data curation methods, and similar conclusions were reached. These results highlight that high-performance ML-extracted variables trained on expert-abstracted data can achieve similar results as when using abstracted data, unlocking the ability to perform oncology research at scale.
format Online
Article
Text
id pubmed-10046618
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-100466182023-03-29 Replication of Real-World Evidence in Oncology Using Electronic Health Record Data Extracted by Machine Learning Benedum, Corey M. Sondhi, Arjun Fidyk, Erin Cohen, Aaron B. Nemeth, Sheila Adamson, Blythe Estévez, Melissa Bozkurt, Selen Cancers (Basel) Article SIMPLE SUMMARY: Obtaining and structuring information about the characteristics, treatments, and outcomes of people living with cancer for research purposes is difficult and resource-intensive. Oftentimes, this information can only be found in electronic health records (EHRs). In response, researchers use natural language processing with machine learning (ML extraction) techniques to extract information at scale. This study evaluated the quality and fitness-for-use of EHR-derived oncology data curated using ML extraction, relative to the standard approach, abstraction by trained experts. Using patients with lung cancer from a real-world database, we performed replication analyses demonstrating common analyses conducted in observational research. Eligible patients were selected into biomarker- and treatment-defined cohorts, first with expert-abstracted then with ML-extracted data. The study’s results and conclusions were similar regardless of the data curation method used. These results demonstrate that high-performance ML-extracted variables trained on expert-abstracted data can achieve similar results as when using abstracted data, unlocking the ability to perform oncology research at scale. ABSTRACT: Meaningful real-world evidence (RWE) generation requires unstructured data found in electronic health records (EHRs) which are often missing from administrative claims; however, obtaining relevant data from unstructured EHR sources is resource-intensive. In response, researchers are using natural language processing (NLP) with machine learning (ML) techniques (i.e., ML extraction) to extract real-world data (RWD) at scale. This study assessed the quality and fitness-for-use of EHR-derived oncology data curated using NLP with ML as compared to the reference standard of expert abstraction. Using a sample of 186,313 patients with lung cancer from a nationwide EHR-derived de-identified database, we performed a series of replication analyses demonstrating some common analyses conducted in retrospective observational research with complex EHR-derived data to generate evidence. Eligible patients were selected into biomarker- and treatment-defined cohorts, first with expert-abstracted then with ML-extracted data. We utilized the biomarker- and treatment-defined cohorts to perform analyses related to biomarker-associated survival and treatment comparative effectiveness, respectively. Across all analyses, the results differed by less than 8% between the data curation methods, and similar conclusions were reached. These results highlight that high-performance ML-extracted variables trained on expert-abstracted data can achieve similar results as when using abstracted data, unlocking the ability to perform oncology research at scale. MDPI 2023-03-20 /pmc/articles/PMC10046618/ /pubmed/36980739 http://dx.doi.org/10.3390/cancers15061853 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Benedum, Corey M.
Sondhi, Arjun
Fidyk, Erin
Cohen, Aaron B.
Nemeth, Sheila
Adamson, Blythe
Estévez, Melissa
Bozkurt, Selen
Replication of Real-World Evidence in Oncology Using Electronic Health Record Data Extracted by Machine Learning
title Replication of Real-World Evidence in Oncology Using Electronic Health Record Data Extracted by Machine Learning
title_full Replication of Real-World Evidence in Oncology Using Electronic Health Record Data Extracted by Machine Learning
title_fullStr Replication of Real-World Evidence in Oncology Using Electronic Health Record Data Extracted by Machine Learning
title_full_unstemmed Replication of Real-World Evidence in Oncology Using Electronic Health Record Data Extracted by Machine Learning
title_short Replication of Real-World Evidence in Oncology Using Electronic Health Record Data Extracted by Machine Learning
title_sort replication of real-world evidence in oncology using electronic health record data extracted by machine learning
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10046618/
https://www.ncbi.nlm.nih.gov/pubmed/36980739
http://dx.doi.org/10.3390/cancers15061853
work_keys_str_mv AT benedumcoreym replicationofrealworldevidenceinoncologyusingelectronichealthrecorddataextractedbymachinelearning
AT sondhiarjun replicationofrealworldevidenceinoncologyusingelectronichealthrecorddataextractedbymachinelearning
AT fidykerin replicationofrealworldevidenceinoncologyusingelectronichealthrecorddataextractedbymachinelearning
AT cohenaaronb replicationofrealworldevidenceinoncologyusingelectronichealthrecorddataextractedbymachinelearning
AT nemethsheila replicationofrealworldevidenceinoncologyusingelectronichealthrecorddataextractedbymachinelearning
AT adamsonblythe replicationofrealworldevidenceinoncologyusingelectronichealthrecorddataextractedbymachinelearning
AT estevezmelissa replicationofrealworldevidenceinoncologyusingelectronichealthrecorddataextractedbymachinelearning
AT bozkurtselen replicationofrealworldevidenceinoncologyusingelectronichealthrecorddataextractedbymachinelearning