Cargando…

Considerations for the Use of Machine Learning Extracted Real-World Data to Support Evidence Generation: A Research-Centric Evaluation Framework

SIMPLE SUMMARY: Many patient clinical characteristics, such as diagnosis dates, biomarker status, and therapies received, are only available as unstructured text in electronic health records. Obtaining this information for research purposes is a difficult and costly process, requiring trained clinic...

Descripción completa

Detalles Bibliográficos
Autores principales:	Estevez, Melissa, Benedum, Corey M., Jiang, Chengsheng, Cohen, Aaron B., Phadke, Sharang, Sarkar, Somnath, Bozkurt, Selen
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2022
Materias:	Review
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9264846/ https://www.ncbi.nlm.nih.gov/pubmed/35804834 http://dx.doi.org/10.3390/cancers14133063

_version_	1784743055361835008
author	Estevez, Melissa Benedum, Corey M. Jiang, Chengsheng Cohen, Aaron B. Phadke, Sharang Sarkar, Somnath Bozkurt, Selen
author_facet	Estevez, Melissa Benedum, Corey M. Jiang, Chengsheng Cohen, Aaron B. Phadke, Sharang Sarkar, Somnath Bozkurt, Selen
author_sort	Estevez, Melissa
collection	PubMed
description	SIMPLE SUMMARY: Many patient clinical characteristics, such as diagnosis dates, biomarker status, and therapies received, are only available as unstructured text in electronic health records. Obtaining this information for research purposes is a difficult and costly process, requiring trained clinical experts to manually review patient documents. Machine Learning techniques offer a promising solution for efficiently extracting clinically relevant information from unstructured text found in patient documents. However, the use of data produced with machine learning techniques for research purposes introduces unique challenges in assessing validity and generalizability to different cohorts of interest. To enable the effective and accurate use of such data for research purposes, we developed an evaluation framework to be utilized by model developers, data users, and other stakeholders. This framework can serve as a baseline to contextualize the quality, strengths, and limitations of using data produced with machine learning techniques for research purposes. ABSTRACT: A vast amount of real-world data, such as pathology reports and clinical notes, are captured as unstructured text in electronic health records (EHRs). However, this information is both difficult and costly to extract through human abstraction, especially when scaling to large datasets is needed. Fortunately, Natural Language Processing (NLP) and Machine Learning (ML) techniques provide promising solutions for a variety of information extraction tasks such as identifying a group of patients who have a specific diagnosis, share common characteristics, or show progression of a disease. However, using these ML-extracted data for research still introduces unique challenges in assessing validity and generalizability to different cohorts of interest. In order to enable effective and accurate use of ML-extracted real-world data (RWD) to support research and real-world evidence generation, we propose a research-centric evaluation framework for model developers, ML-extracted data users and other RWD stakeholders. This framework covers the fundamentals of evaluating RWD produced using ML methods to maximize the use of EHR data for research purposes.
format	Online Article Text
id	pubmed-9264846
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-92648462022-07-09 Considerations for the Use of Machine Learning Extracted Real-World Data to Support Evidence Generation: A Research-Centric Evaluation Framework Estevez, Melissa Benedum, Corey M. Jiang, Chengsheng Cohen, Aaron B. Phadke, Sharang Sarkar, Somnath Bozkurt, Selen Cancers (Basel) Review SIMPLE SUMMARY: Many patient clinical characteristics, such as diagnosis dates, biomarker status, and therapies received, are only available as unstructured text in electronic health records. Obtaining this information for research purposes is a difficult and costly process, requiring trained clinical experts to manually review patient documents. Machine Learning techniques offer a promising solution for efficiently extracting clinically relevant information from unstructured text found in patient documents. However, the use of data produced with machine learning techniques for research purposes introduces unique challenges in assessing validity and generalizability to different cohorts of interest. To enable the effective and accurate use of such data for research purposes, we developed an evaluation framework to be utilized by model developers, data users, and other stakeholders. This framework can serve as a baseline to contextualize the quality, strengths, and limitations of using data produced with machine learning techniques for research purposes. ABSTRACT: A vast amount of real-world data, such as pathology reports and clinical notes, are captured as unstructured text in electronic health records (EHRs). However, this information is both difficult and costly to extract through human abstraction, especially when scaling to large datasets is needed. Fortunately, Natural Language Processing (NLP) and Machine Learning (ML) techniques provide promising solutions for a variety of information extraction tasks such as identifying a group of patients who have a specific diagnosis, share common characteristics, or show progression of a disease. However, using these ML-extracted data for research still introduces unique challenges in assessing validity and generalizability to different cohorts of interest. In order to enable effective and accurate use of ML-extracted real-world data (RWD) to support research and real-world evidence generation, we propose a research-centric evaluation framework for model developers, ML-extracted data users and other RWD stakeholders. This framework covers the fundamentals of evaluating RWD produced using ML methods to maximize the use of EHR data for research purposes. MDPI 2022-06-22 /pmc/articles/PMC9264846/ /pubmed/35804834 http://dx.doi.org/10.3390/cancers14133063 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Review Estevez, Melissa Benedum, Corey M. Jiang, Chengsheng Cohen, Aaron B. Phadke, Sharang Sarkar, Somnath Bozkurt, Selen Considerations for the Use of Machine Learning Extracted Real-World Data to Support Evidence Generation: A Research-Centric Evaluation Framework
title	Considerations for the Use of Machine Learning Extracted Real-World Data to Support Evidence Generation: A Research-Centric Evaluation Framework
title_full	Considerations for the Use of Machine Learning Extracted Real-World Data to Support Evidence Generation: A Research-Centric Evaluation Framework
title_fullStr	Considerations for the Use of Machine Learning Extracted Real-World Data to Support Evidence Generation: A Research-Centric Evaluation Framework
title_full_unstemmed	Considerations for the Use of Machine Learning Extracted Real-World Data to Support Evidence Generation: A Research-Centric Evaluation Framework
title_short	Considerations for the Use of Machine Learning Extracted Real-World Data to Support Evidence Generation: A Research-Centric Evaluation Framework
title_sort	considerations for the use of machine learning extracted real-world data to support evidence generation: a research-centric evaluation framework
topic	Review
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9264846/ https://www.ncbi.nlm.nih.gov/pubmed/35804834 http://dx.doi.org/10.3390/cancers14133063
work_keys_str_mv	AT estevezmelissa considerationsfortheuseofmachinelearningextractedrealworlddatatosupportevidencegenerationaresearchcentricevaluationframework AT benedumcoreym considerationsfortheuseofmachinelearningextractedrealworlddatatosupportevidencegenerationaresearchcentricevaluationframework AT jiangchengsheng considerationsfortheuseofmachinelearningextractedrealworlddatatosupportevidencegenerationaresearchcentricevaluationframework AT cohenaaronb considerationsfortheuseofmachinelearningextractedrealworlddatatosupportevidencegenerationaresearchcentricevaluationframework AT phadkesharang considerationsfortheuseofmachinelearningextractedrealworlddatatosupportevidencegenerationaresearchcentricevaluationframework AT sarkarsomnath considerationsfortheuseofmachinelearningextractedrealworlddatatosupportevidencegenerationaresearchcentricevaluationframework AT bozkurtselen considerationsfortheuseofmachinelearningextractedrealworlddatatosupportevidencegenerationaresearchcentricevaluationframework

Considerations for the Use of Machine Learning Extracted Real-World Data to Support Evidence Generation: A Research-Centric Evaluation Framework

Ejemplares similares