Cargando…
Replication of Real-World Evidence in Oncology Using Electronic Health Record Data Extracted by Machine Learning
SIMPLE SUMMARY: Obtaining and structuring information about the characteristics, treatments, and outcomes of people living with cancer for research purposes is difficult and resource-intensive. Oftentimes, this information can only be found in electronic health records (EHRs). In response, researche...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10046618/ https://www.ncbi.nlm.nih.gov/pubmed/36980739 http://dx.doi.org/10.3390/cancers15061853 |
_version_ | 1785013718815342592 |
---|---|
author | Benedum, Corey M. Sondhi, Arjun Fidyk, Erin Cohen, Aaron B. Nemeth, Sheila Adamson, Blythe Estévez, Melissa Bozkurt, Selen |
author_facet | Benedum, Corey M. Sondhi, Arjun Fidyk, Erin Cohen, Aaron B. Nemeth, Sheila Adamson, Blythe Estévez, Melissa Bozkurt, Selen |
author_sort | Benedum, Corey M. |
collection | PubMed |
description | SIMPLE SUMMARY: Obtaining and structuring information about the characteristics, treatments, and outcomes of people living with cancer for research purposes is difficult and resource-intensive. Oftentimes, this information can only be found in electronic health records (EHRs). In response, researchers use natural language processing with machine learning (ML extraction) techniques to extract information at scale. This study evaluated the quality and fitness-for-use of EHR-derived oncology data curated using ML extraction, relative to the standard approach, abstraction by trained experts. Using patients with lung cancer from a real-world database, we performed replication analyses demonstrating common analyses conducted in observational research. Eligible patients were selected into biomarker- and treatment-defined cohorts, first with expert-abstracted then with ML-extracted data. The study’s results and conclusions were similar regardless of the data curation method used. These results demonstrate that high-performance ML-extracted variables trained on expert-abstracted data can achieve similar results as when using abstracted data, unlocking the ability to perform oncology research at scale. ABSTRACT: Meaningful real-world evidence (RWE) generation requires unstructured data found in electronic health records (EHRs) which are often missing from administrative claims; however, obtaining relevant data from unstructured EHR sources is resource-intensive. In response, researchers are using natural language processing (NLP) with machine learning (ML) techniques (i.e., ML extraction) to extract real-world data (RWD) at scale. This study assessed the quality and fitness-for-use of EHR-derived oncology data curated using NLP with ML as compared to the reference standard of expert abstraction. Using a sample of 186,313 patients with lung cancer from a nationwide EHR-derived de-identified database, we performed a series of replication analyses demonstrating some common analyses conducted in retrospective observational research with complex EHR-derived data to generate evidence. Eligible patients were selected into biomarker- and treatment-defined cohorts, first with expert-abstracted then with ML-extracted data. We utilized the biomarker- and treatment-defined cohorts to perform analyses related to biomarker-associated survival and treatment comparative effectiveness, respectively. Across all analyses, the results differed by less than 8% between the data curation methods, and similar conclusions were reached. These results highlight that high-performance ML-extracted variables trained on expert-abstracted data can achieve similar results as when using abstracted data, unlocking the ability to perform oncology research at scale. |
format | Online Article Text |
id | pubmed-10046618 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-100466182023-03-29 Replication of Real-World Evidence in Oncology Using Electronic Health Record Data Extracted by Machine Learning Benedum, Corey M. Sondhi, Arjun Fidyk, Erin Cohen, Aaron B. Nemeth, Sheila Adamson, Blythe Estévez, Melissa Bozkurt, Selen Cancers (Basel) Article SIMPLE SUMMARY: Obtaining and structuring information about the characteristics, treatments, and outcomes of people living with cancer for research purposes is difficult and resource-intensive. Oftentimes, this information can only be found in electronic health records (EHRs). In response, researchers use natural language processing with machine learning (ML extraction) techniques to extract information at scale. This study evaluated the quality and fitness-for-use of EHR-derived oncology data curated using ML extraction, relative to the standard approach, abstraction by trained experts. Using patients with lung cancer from a real-world database, we performed replication analyses demonstrating common analyses conducted in observational research. Eligible patients were selected into biomarker- and treatment-defined cohorts, first with expert-abstracted then with ML-extracted data. The study’s results and conclusions were similar regardless of the data curation method used. These results demonstrate that high-performance ML-extracted variables trained on expert-abstracted data can achieve similar results as when using abstracted data, unlocking the ability to perform oncology research at scale. ABSTRACT: Meaningful real-world evidence (RWE) generation requires unstructured data found in electronic health records (EHRs) which are often missing from administrative claims; however, obtaining relevant data from unstructured EHR sources is resource-intensive. In response, researchers are using natural language processing (NLP) with machine learning (ML) techniques (i.e., ML extraction) to extract real-world data (RWD) at scale. This study assessed the quality and fitness-for-use of EHR-derived oncology data curated using NLP with ML as compared to the reference standard of expert abstraction. Using a sample of 186,313 patients with lung cancer from a nationwide EHR-derived de-identified database, we performed a series of replication analyses demonstrating some common analyses conducted in retrospective observational research with complex EHR-derived data to generate evidence. Eligible patients were selected into biomarker- and treatment-defined cohorts, first with expert-abstracted then with ML-extracted data. We utilized the biomarker- and treatment-defined cohorts to perform analyses related to biomarker-associated survival and treatment comparative effectiveness, respectively. Across all analyses, the results differed by less than 8% between the data curation methods, and similar conclusions were reached. These results highlight that high-performance ML-extracted variables trained on expert-abstracted data can achieve similar results as when using abstracted data, unlocking the ability to perform oncology research at scale. MDPI 2023-03-20 /pmc/articles/PMC10046618/ /pubmed/36980739 http://dx.doi.org/10.3390/cancers15061853 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Benedum, Corey M. Sondhi, Arjun Fidyk, Erin Cohen, Aaron B. Nemeth, Sheila Adamson, Blythe Estévez, Melissa Bozkurt, Selen Replication of Real-World Evidence in Oncology Using Electronic Health Record Data Extracted by Machine Learning |
title | Replication of Real-World Evidence in Oncology Using Electronic Health Record Data Extracted by Machine Learning |
title_full | Replication of Real-World Evidence in Oncology Using Electronic Health Record Data Extracted by Machine Learning |
title_fullStr | Replication of Real-World Evidence in Oncology Using Electronic Health Record Data Extracted by Machine Learning |
title_full_unstemmed | Replication of Real-World Evidence in Oncology Using Electronic Health Record Data Extracted by Machine Learning |
title_short | Replication of Real-World Evidence in Oncology Using Electronic Health Record Data Extracted by Machine Learning |
title_sort | replication of real-world evidence in oncology using electronic health record data extracted by machine learning |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10046618/ https://www.ncbi.nlm.nih.gov/pubmed/36980739 http://dx.doi.org/10.3390/cancers15061853 |
work_keys_str_mv | AT benedumcoreym replicationofrealworldevidenceinoncologyusingelectronichealthrecorddataextractedbymachinelearning AT sondhiarjun replicationofrealworldevidenceinoncologyusingelectronichealthrecorddataextractedbymachinelearning AT fidykerin replicationofrealworldevidenceinoncologyusingelectronichealthrecorddataextractedbymachinelearning AT cohenaaronb replicationofrealworldevidenceinoncologyusingelectronichealthrecorddataextractedbymachinelearning AT nemethsheila replicationofrealworldevidenceinoncologyusingelectronichealthrecorddataextractedbymachinelearning AT adamsonblythe replicationofrealworldevidenceinoncologyusingelectronichealthrecorddataextractedbymachinelearning AT estevezmelissa replicationofrealworldevidenceinoncologyusingelectronichealthrecorddataextractedbymachinelearning AT bozkurtselen replicationofrealworldevidenceinoncologyusingelectronichealthrecorddataextractedbymachinelearning |