Cargando…

Replication of Real-World Evidence in Oncology Using Electronic Health Record Data Extracted by Machine Learning

SIMPLE SUMMARY: Obtaining and structuring information about the characteristics, treatments, and outcomes of people living with cancer for research purposes is difficult and resource-intensive. Oftentimes, this information can only be found in electronic health records (EHRs). In response, researche...

Descripción completa

Detalles Bibliográficos
Autores principales:	Benedum, Corey M., Sondhi, Arjun, Fidyk, Erin, Cohen, Aaron B., Nemeth, Sheila, Adamson, Blythe, Estévez, Melissa, Bozkurt, Selen
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10046618/ https://www.ncbi.nlm.nih.gov/pubmed/36980739 http://dx.doi.org/10.3390/cancers15061853

_version_	1785013718815342592
author	Benedum, Corey M. Sondhi, Arjun Fidyk, Erin Cohen, Aaron B. Nemeth, Sheila Adamson, Blythe Estévez, Melissa Bozkurt, Selen
author_facet	Benedum, Corey M. Sondhi, Arjun Fidyk, Erin Cohen, Aaron B. Nemeth, Sheila Adamson, Blythe Estévez, Melissa Bozkurt, Selen
author_sort	Benedum, Corey M.
collection	PubMed
description	SIMPLE SUMMARY: Obtaining and structuring information about the characteristics, treatments, and outcomes of people living with cancer for research purposes is difficult and resource-intensive. Oftentimes, this information can only be found in electronic health records (EHRs). In response, researchers use natural language processing with machine learning (ML extraction) techniques to extract information at scale. This study evaluated the quality and fitness-for-use of EHR-derived oncology data curated using ML extraction, relative to the standard approach, abstraction by trained experts. Using patients with lung cancer from a real-world database, we performed replication analyses demonstrating common analyses conducted in observational research. Eligible patients were selected into biomarker- and treatment-defined cohorts, first with expert-abstracted then with ML-extracted data. The study’s results and conclusions were similar regardless of the data curation method used. These results demonstrate that high-performance ML-extracted variables trained on expert-abstracted data can achieve similar results as when using abstracted data, unlocking the ability to perform oncology research at scale. ABSTRACT: Meaningful real-world evidence (RWE) generation requires unstructured data found in electronic health records (EHRs) which are often missing from administrative claims; however, obtaining relevant data from unstructured EHR sources is resource-intensive. In response, researchers are using natural language processing (NLP) with machine learning (ML) techniques (i.e., ML extraction) to extract real-world data (RWD) at scale. This study assessed the quality and fitness-for-use of EHR-derived oncology data curated using NLP with ML as compared to the reference standard of expert abstraction. Using a sample of 186,313 patients with lung cancer from a nationwide EHR-derived de-identified database, we performed a series of replication analyses demonstrating some common analyses conducted in retrospective observational research with complex EHR-derived data to generate evidence. Eligible patients were selected into biomarker- and treatment-defined cohorts, first with expert-abstracted then with ML-extracted data. We utilized the biomarker- and treatment-defined cohorts to perform analyses related to biomarker-associated survival and treatment comparative effectiveness, respectively. Across all analyses, the results differed by less than 8% between the data curation methods, and similar conclusions were reached. These results highlight that high-performance ML-extracted variables trained on expert-abstracted data can achieve similar results as when using abstracted data, unlocking the ability to perform oncology research at scale.
format	Online Article Text
id	pubmed-10046618
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-100466182023-03-29 Replication of Real-World Evidence in Oncology Using Electronic Health Record Data Extracted by Machine Learning Benedum, Corey M. Sondhi, Arjun Fidyk, Erin Cohen, Aaron B. Nemeth, Sheila Adamson, Blythe Estévez, Melissa Bozkurt, Selen Cancers (Basel) Article SIMPLE SUMMARY: Obtaining and structuring information about the characteristics, treatments, and outcomes of people living with cancer for research purposes is difficult and resource-intensive. Oftentimes, this information can only be found in electronic health records (EHRs). In response, researchers use natural language processing with machine learning (ML extraction) techniques to extract information at scale. This study evaluated the quality and fitness-for-use of EHR-derived oncology data curated using ML extraction, relative to the standard approach, abstraction by trained experts. Using patients with lung cancer from a real-world database, we performed replication analyses demonstrating common analyses conducted in observational research. Eligible patients were selected into biomarker- and treatment-defined cohorts, first with expert-abstracted then with ML-extracted data. The study’s results and conclusions were similar regardless of the data curation method used. These results demonstrate that high-performance ML-extracted variables trained on expert-abstracted data can achieve similar results as when using abstracted data, unlocking the ability to perform oncology research at scale. ABSTRACT: Meaningful real-world evidence (RWE) generation requires unstructured data found in electronic health records (EHRs) which are often missing from administrative claims; however, obtaining relevant data from unstructured EHR sources is resource-intensive. In response, researchers are using natural language processing (NLP) with machine learning (ML) techniques (i.e., ML extraction) to extract real-world data (RWD) at scale. This study assessed the quality and fitness-for-use of EHR-derived oncology data curated using NLP with ML as compared to the reference standard of expert abstraction. Using a sample of 186,313 patients with lung cancer from a nationwide EHR-derived de-identified database, we performed a series of replication analyses demonstrating some common analyses conducted in retrospective observational research with complex EHR-derived data to generate evidence. Eligible patients were selected into biomarker- and treatment-defined cohorts, first with expert-abstracted then with ML-extracted data. We utilized the biomarker- and treatment-defined cohorts to perform analyses related to biomarker-associated survival and treatment comparative effectiveness, respectively. Across all analyses, the results differed by less than 8% between the data curation methods, and similar conclusions were reached. These results highlight that high-performance ML-extracted variables trained on expert-abstracted data can achieve similar results as when using abstracted data, unlocking the ability to perform oncology research at scale. MDPI 2023-03-20 /pmc/articles/PMC10046618/ /pubmed/36980739 http://dx.doi.org/10.3390/cancers15061853 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Benedum, Corey M. Sondhi, Arjun Fidyk, Erin Cohen, Aaron B. Nemeth, Sheila Adamson, Blythe Estévez, Melissa Bozkurt, Selen Replication of Real-World Evidence in Oncology Using Electronic Health Record Data Extracted by Machine Learning
title	Replication of Real-World Evidence in Oncology Using Electronic Health Record Data Extracted by Machine Learning
title_full	Replication of Real-World Evidence in Oncology Using Electronic Health Record Data Extracted by Machine Learning
title_fullStr	Replication of Real-World Evidence in Oncology Using Electronic Health Record Data Extracted by Machine Learning
title_full_unstemmed	Replication of Real-World Evidence in Oncology Using Electronic Health Record Data Extracted by Machine Learning
title_short	Replication of Real-World Evidence in Oncology Using Electronic Health Record Data Extracted by Machine Learning
title_sort	replication of real-world evidence in oncology using electronic health record data extracted by machine learning
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10046618/ https://www.ncbi.nlm.nih.gov/pubmed/36980739 http://dx.doi.org/10.3390/cancers15061853
work_keys_str_mv	AT benedumcoreym replicationofrealworldevidenceinoncologyusingelectronichealthrecorddataextractedbymachinelearning AT sondhiarjun replicationofrealworldevidenceinoncologyusingelectronichealthrecorddataextractedbymachinelearning AT fidykerin replicationofrealworldevidenceinoncologyusingelectronichealthrecorddataextractedbymachinelearning AT cohenaaronb replicationofrealworldevidenceinoncologyusingelectronichealthrecorddataextractedbymachinelearning AT nemethsheila replicationofrealworldevidenceinoncologyusingelectronichealthrecorddataextractedbymachinelearning AT adamsonblythe replicationofrealworldevidenceinoncologyusingelectronichealthrecorddataextractedbymachinelearning AT estevezmelissa replicationofrealworldevidenceinoncologyusingelectronichealthrecorddataextractedbymachinelearning AT bozkurtselen replicationofrealworldevidenceinoncologyusingelectronichealthrecorddataextractedbymachinelearning

Replication of Real-World Evidence in Oncology Using Electronic Health Record Data Extracted by Machine Learning

Ejemplares similares