Cargando…

Approach to machine learning for extraction of real-world data variables from electronic health records

Background: As artificial intelligence (AI) continues to advance with breakthroughs in natural language processing (NLP) and machine learning (ML), such as the development of models like OpenAI’s ChatGPT, new opportunities are emerging for efficient curation of electronic health records (EHR) into r...

Descripción completa

Detalles Bibliográficos
Autores principales: Adamson, Blythe, Waskom, Michael, Blarre, Auriane, Kelly, Jonathan, Krismer, Konstantin, Nemeth, Sheila, Gippetti, James, Ritten, John, Harrison, Katherine, Ho, George, Linzmayer, Robin, Bansal, Tarun, Wilkinson, Samuel, Amster, Guy, Estola, Evan, Benedum, Corey M., Fidyk, Erin, Estévez, Melissa, Shapiro, Will, Cohen, Aaron B.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10541019/
https://www.ncbi.nlm.nih.gov/pubmed/37781703
http://dx.doi.org/10.3389/fphar.2023.1180962
_version_ 1785113834139156480
author Adamson, Blythe
Waskom, Michael
Blarre, Auriane
Kelly, Jonathan
Krismer, Konstantin
Nemeth, Sheila
Gippetti, James
Ritten, John
Harrison, Katherine
Ho, George
Linzmayer, Robin
Bansal, Tarun
Wilkinson, Samuel
Amster, Guy
Estola, Evan
Benedum, Corey M.
Fidyk, Erin
Estévez, Melissa
Shapiro, Will
Cohen, Aaron B.
author_facet Adamson, Blythe
Waskom, Michael
Blarre, Auriane
Kelly, Jonathan
Krismer, Konstantin
Nemeth, Sheila
Gippetti, James
Ritten, John
Harrison, Katherine
Ho, George
Linzmayer, Robin
Bansal, Tarun
Wilkinson, Samuel
Amster, Guy
Estola, Evan
Benedum, Corey M.
Fidyk, Erin
Estévez, Melissa
Shapiro, Will
Cohen, Aaron B.
author_sort Adamson, Blythe
collection PubMed
description Background: As artificial intelligence (AI) continues to advance with breakthroughs in natural language processing (NLP) and machine learning (ML), such as the development of models like OpenAI’s ChatGPT, new opportunities are emerging for efficient curation of electronic health records (EHR) into real-world data (RWD) for evidence generation in oncology. Our objective is to describe the research and development of industry methods to promote transparency and explainability. Methods: We applied NLP with ML techniques to train, validate, and test the extraction of information from unstructured documents (e.g., clinician notes, radiology reports, lab reports, etc.) to output a set of structured variables required for RWD analysis. This research used a nationwide electronic health record (EHR)-derived database. Models were selected based on performance. Variables curated with an approach using ML extraction are those where the value is determined solely based on an ML model (i.e. not confirmed by abstraction), which identifies key information from visit notes and documents. These models do not predict future events or infer missing information. Results: We developed an approach using NLP and ML for extraction of clinically meaningful information from unstructured EHR documents and found high performance of output variables compared with variables curated by manually abstracted data. These extraction methods resulted in research-ready variables including initial cancer diagnosis with date, advanced/metastatic diagnosis with date, disease stage, histology, smoking status, surgery status with date, biomarker test results with dates, and oral treatments with dates. Conclusion: NLP and ML enable the extraction of retrospective clinical data in EHR with speed and scalability to help researchers learn from the experience of every person with cancer.
format Online
Article
Text
id pubmed-10541019
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-105410192023-10-01 Approach to machine learning for extraction of real-world data variables from electronic health records Adamson, Blythe Waskom, Michael Blarre, Auriane Kelly, Jonathan Krismer, Konstantin Nemeth, Sheila Gippetti, James Ritten, John Harrison, Katherine Ho, George Linzmayer, Robin Bansal, Tarun Wilkinson, Samuel Amster, Guy Estola, Evan Benedum, Corey M. Fidyk, Erin Estévez, Melissa Shapiro, Will Cohen, Aaron B. Front Pharmacol Pharmacology Background: As artificial intelligence (AI) continues to advance with breakthroughs in natural language processing (NLP) and machine learning (ML), such as the development of models like OpenAI’s ChatGPT, new opportunities are emerging for efficient curation of electronic health records (EHR) into real-world data (RWD) for evidence generation in oncology. Our objective is to describe the research and development of industry methods to promote transparency and explainability. Methods: We applied NLP with ML techniques to train, validate, and test the extraction of information from unstructured documents (e.g., clinician notes, radiology reports, lab reports, etc.) to output a set of structured variables required for RWD analysis. This research used a nationwide electronic health record (EHR)-derived database. Models were selected based on performance. Variables curated with an approach using ML extraction are those where the value is determined solely based on an ML model (i.e. not confirmed by abstraction), which identifies key information from visit notes and documents. These models do not predict future events or infer missing information. Results: We developed an approach using NLP and ML for extraction of clinically meaningful information from unstructured EHR documents and found high performance of output variables compared with variables curated by manually abstracted data. These extraction methods resulted in research-ready variables including initial cancer diagnosis with date, advanced/metastatic diagnosis with date, disease stage, histology, smoking status, surgery status with date, biomarker test results with dates, and oral treatments with dates. Conclusion: NLP and ML enable the extraction of retrospective clinical data in EHR with speed and scalability to help researchers learn from the experience of every person with cancer. Frontiers Media S.A. 2023-09-15 /pmc/articles/PMC10541019/ /pubmed/37781703 http://dx.doi.org/10.3389/fphar.2023.1180962 Text en Copyright © 2023 Adamson, Waskom, Blarre, Kelly, Krismer, Nemeth, Gippetti, Ritten, Harrison, Ho, Linzmayer, Bansal, Wilkinson, Amster, Estola, Benedum, Fidyk, Estévez, Shapiro and Cohen. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Pharmacology
Adamson, Blythe
Waskom, Michael
Blarre, Auriane
Kelly, Jonathan
Krismer, Konstantin
Nemeth, Sheila
Gippetti, James
Ritten, John
Harrison, Katherine
Ho, George
Linzmayer, Robin
Bansal, Tarun
Wilkinson, Samuel
Amster, Guy
Estola, Evan
Benedum, Corey M.
Fidyk, Erin
Estévez, Melissa
Shapiro, Will
Cohen, Aaron B.
Approach to machine learning for extraction of real-world data variables from electronic health records
title Approach to machine learning for extraction of real-world data variables from electronic health records
title_full Approach to machine learning for extraction of real-world data variables from electronic health records
title_fullStr Approach to machine learning for extraction of real-world data variables from electronic health records
title_full_unstemmed Approach to machine learning for extraction of real-world data variables from electronic health records
title_short Approach to machine learning for extraction of real-world data variables from electronic health records
title_sort approach to machine learning for extraction of real-world data variables from electronic health records
topic Pharmacology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10541019/
https://www.ncbi.nlm.nih.gov/pubmed/37781703
http://dx.doi.org/10.3389/fphar.2023.1180962
work_keys_str_mv AT adamsonblythe approachtomachinelearningforextractionofrealworlddatavariablesfromelectronichealthrecords
AT waskommichael approachtomachinelearningforextractionofrealworlddatavariablesfromelectronichealthrecords
AT blarreauriane approachtomachinelearningforextractionofrealworlddatavariablesfromelectronichealthrecords
AT kellyjonathan approachtomachinelearningforextractionofrealworlddatavariablesfromelectronichealthrecords
AT krismerkonstantin approachtomachinelearningforextractionofrealworlddatavariablesfromelectronichealthrecords
AT nemethsheila approachtomachinelearningforextractionofrealworlddatavariablesfromelectronichealthrecords
AT gippettijames approachtomachinelearningforextractionofrealworlddatavariablesfromelectronichealthrecords
AT rittenjohn approachtomachinelearningforextractionofrealworlddatavariablesfromelectronichealthrecords
AT harrisonkatherine approachtomachinelearningforextractionofrealworlddatavariablesfromelectronichealthrecords
AT hogeorge approachtomachinelearningforextractionofrealworlddatavariablesfromelectronichealthrecords
AT linzmayerrobin approachtomachinelearningforextractionofrealworlddatavariablesfromelectronichealthrecords
AT bansaltarun approachtomachinelearningforextractionofrealworlddatavariablesfromelectronichealthrecords
AT wilkinsonsamuel approachtomachinelearningforextractionofrealworlddatavariablesfromelectronichealthrecords
AT amsterguy approachtomachinelearningforextractionofrealworlddatavariablesfromelectronichealthrecords
AT estolaevan approachtomachinelearningforextractionofrealworlddatavariablesfromelectronichealthrecords
AT benedumcoreym approachtomachinelearningforextractionofrealworlddatavariablesfromelectronichealthrecords
AT fidykerin approachtomachinelearningforextractionofrealworlddatavariablesfromelectronichealthrecords
AT estevezmelissa approachtomachinelearningforextractionofrealworlddatavariablesfromelectronichealthrecords
AT shapirowill approachtomachinelearningforextractionofrealworlddatavariablesfromelectronichealthrecords
AT cohenaaronb approachtomachinelearningforextractionofrealworlddatavariablesfromelectronichealthrecords