Cargando…
Approach to machine learning for extraction of real-world data variables from electronic health records
Background: As artificial intelligence (AI) continues to advance with breakthroughs in natural language processing (NLP) and machine learning (ML), such as the development of models like OpenAI’s ChatGPT, new opportunities are emerging for efficient curation of electronic health records (EHR) into r...
Autores principales: | , , , , , , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10541019/ https://www.ncbi.nlm.nih.gov/pubmed/37781703 http://dx.doi.org/10.3389/fphar.2023.1180962 |
_version_ | 1785113834139156480 |
---|---|
author | Adamson, Blythe Waskom, Michael Blarre, Auriane Kelly, Jonathan Krismer, Konstantin Nemeth, Sheila Gippetti, James Ritten, John Harrison, Katherine Ho, George Linzmayer, Robin Bansal, Tarun Wilkinson, Samuel Amster, Guy Estola, Evan Benedum, Corey M. Fidyk, Erin Estévez, Melissa Shapiro, Will Cohen, Aaron B. |
author_facet | Adamson, Blythe Waskom, Michael Blarre, Auriane Kelly, Jonathan Krismer, Konstantin Nemeth, Sheila Gippetti, James Ritten, John Harrison, Katherine Ho, George Linzmayer, Robin Bansal, Tarun Wilkinson, Samuel Amster, Guy Estola, Evan Benedum, Corey M. Fidyk, Erin Estévez, Melissa Shapiro, Will Cohen, Aaron B. |
author_sort | Adamson, Blythe |
collection | PubMed |
description | Background: As artificial intelligence (AI) continues to advance with breakthroughs in natural language processing (NLP) and machine learning (ML), such as the development of models like OpenAI’s ChatGPT, new opportunities are emerging for efficient curation of electronic health records (EHR) into real-world data (RWD) for evidence generation in oncology. Our objective is to describe the research and development of industry methods to promote transparency and explainability. Methods: We applied NLP with ML techniques to train, validate, and test the extraction of information from unstructured documents (e.g., clinician notes, radiology reports, lab reports, etc.) to output a set of structured variables required for RWD analysis. This research used a nationwide electronic health record (EHR)-derived database. Models were selected based on performance. Variables curated with an approach using ML extraction are those where the value is determined solely based on an ML model (i.e. not confirmed by abstraction), which identifies key information from visit notes and documents. These models do not predict future events or infer missing information. Results: We developed an approach using NLP and ML for extraction of clinically meaningful information from unstructured EHR documents and found high performance of output variables compared with variables curated by manually abstracted data. These extraction methods resulted in research-ready variables including initial cancer diagnosis with date, advanced/metastatic diagnosis with date, disease stage, histology, smoking status, surgery status with date, biomarker test results with dates, and oral treatments with dates. Conclusion: NLP and ML enable the extraction of retrospective clinical data in EHR with speed and scalability to help researchers learn from the experience of every person with cancer. |
format | Online Article Text |
id | pubmed-10541019 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-105410192023-10-01 Approach to machine learning for extraction of real-world data variables from electronic health records Adamson, Blythe Waskom, Michael Blarre, Auriane Kelly, Jonathan Krismer, Konstantin Nemeth, Sheila Gippetti, James Ritten, John Harrison, Katherine Ho, George Linzmayer, Robin Bansal, Tarun Wilkinson, Samuel Amster, Guy Estola, Evan Benedum, Corey M. Fidyk, Erin Estévez, Melissa Shapiro, Will Cohen, Aaron B. Front Pharmacol Pharmacology Background: As artificial intelligence (AI) continues to advance with breakthroughs in natural language processing (NLP) and machine learning (ML), such as the development of models like OpenAI’s ChatGPT, new opportunities are emerging for efficient curation of electronic health records (EHR) into real-world data (RWD) for evidence generation in oncology. Our objective is to describe the research and development of industry methods to promote transparency and explainability. Methods: We applied NLP with ML techniques to train, validate, and test the extraction of information from unstructured documents (e.g., clinician notes, radiology reports, lab reports, etc.) to output a set of structured variables required for RWD analysis. This research used a nationwide electronic health record (EHR)-derived database. Models were selected based on performance. Variables curated with an approach using ML extraction are those where the value is determined solely based on an ML model (i.e. not confirmed by abstraction), which identifies key information from visit notes and documents. These models do not predict future events or infer missing information. Results: We developed an approach using NLP and ML for extraction of clinically meaningful information from unstructured EHR documents and found high performance of output variables compared with variables curated by manually abstracted data. These extraction methods resulted in research-ready variables including initial cancer diagnosis with date, advanced/metastatic diagnosis with date, disease stage, histology, smoking status, surgery status with date, biomarker test results with dates, and oral treatments with dates. Conclusion: NLP and ML enable the extraction of retrospective clinical data in EHR with speed and scalability to help researchers learn from the experience of every person with cancer. Frontiers Media S.A. 2023-09-15 /pmc/articles/PMC10541019/ /pubmed/37781703 http://dx.doi.org/10.3389/fphar.2023.1180962 Text en Copyright © 2023 Adamson, Waskom, Blarre, Kelly, Krismer, Nemeth, Gippetti, Ritten, Harrison, Ho, Linzmayer, Bansal, Wilkinson, Amster, Estola, Benedum, Fidyk, Estévez, Shapiro and Cohen. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Pharmacology Adamson, Blythe Waskom, Michael Blarre, Auriane Kelly, Jonathan Krismer, Konstantin Nemeth, Sheila Gippetti, James Ritten, John Harrison, Katherine Ho, George Linzmayer, Robin Bansal, Tarun Wilkinson, Samuel Amster, Guy Estola, Evan Benedum, Corey M. Fidyk, Erin Estévez, Melissa Shapiro, Will Cohen, Aaron B. Approach to machine learning for extraction of real-world data variables from electronic health records |
title | Approach to machine learning for extraction of real-world data variables from electronic health records |
title_full | Approach to machine learning for extraction of real-world data variables from electronic health records |
title_fullStr | Approach to machine learning for extraction of real-world data variables from electronic health records |
title_full_unstemmed | Approach to machine learning for extraction of real-world data variables from electronic health records |
title_short | Approach to machine learning for extraction of real-world data variables from electronic health records |
title_sort | approach to machine learning for extraction of real-world data variables from electronic health records |
topic | Pharmacology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10541019/ https://www.ncbi.nlm.nih.gov/pubmed/37781703 http://dx.doi.org/10.3389/fphar.2023.1180962 |
work_keys_str_mv | AT adamsonblythe approachtomachinelearningforextractionofrealworlddatavariablesfromelectronichealthrecords AT waskommichael approachtomachinelearningforextractionofrealworlddatavariablesfromelectronichealthrecords AT blarreauriane approachtomachinelearningforextractionofrealworlddatavariablesfromelectronichealthrecords AT kellyjonathan approachtomachinelearningforextractionofrealworlddatavariablesfromelectronichealthrecords AT krismerkonstantin approachtomachinelearningforextractionofrealworlddatavariablesfromelectronichealthrecords AT nemethsheila approachtomachinelearningforextractionofrealworlddatavariablesfromelectronichealthrecords AT gippettijames approachtomachinelearningforextractionofrealworlddatavariablesfromelectronichealthrecords AT rittenjohn approachtomachinelearningforextractionofrealworlddatavariablesfromelectronichealthrecords AT harrisonkatherine approachtomachinelearningforextractionofrealworlddatavariablesfromelectronichealthrecords AT hogeorge approachtomachinelearningforextractionofrealworlddatavariablesfromelectronichealthrecords AT linzmayerrobin approachtomachinelearningforextractionofrealworlddatavariablesfromelectronichealthrecords AT bansaltarun approachtomachinelearningforextractionofrealworlddatavariablesfromelectronichealthrecords AT wilkinsonsamuel approachtomachinelearningforextractionofrealworlddatavariablesfromelectronichealthrecords AT amsterguy approachtomachinelearningforextractionofrealworlddatavariablesfromelectronichealthrecords AT estolaevan approachtomachinelearningforextractionofrealworlddatavariablesfromelectronichealthrecords AT benedumcoreym approachtomachinelearningforextractionofrealworlddatavariablesfromelectronichealthrecords AT fidykerin approachtomachinelearningforextractionofrealworlddatavariablesfromelectronichealthrecords AT estevezmelissa approachtomachinelearningforextractionofrealworlddatavariablesfromelectronichealthrecords AT shapirowill approachtomachinelearningforextractionofrealworlddatavariablesfromelectronichealthrecords AT cohenaaronb approachtomachinelearningforextractionofrealworlddatavariablesfromelectronichealthrecords |