Cargando…

Generate Analysis-Ready Data for Real-world Evidence: Tutorial for Harnessing Electronic Health Records With Advanced Informatic Technologies

Although randomized controlled trials (RCTs) are the gold standard for establishing the efficacy and safety of a medical treatment, real-world evidence (RWE) generated from real-world data has been vital in postapproval monitoring and is being promoted for the regulatory process of experimental ther...

Descripción completa

Detalles Bibliográficos
Autores principales: Hou, Jue, Zhao, Rachel, Gronsbell, Jessica, Lin, Yucong, Bonzel, Clara-Lea, Zeng, Qingyi, Zhang, Sinian, Beaulieu-Jones, Brett K, Weber, Griffin M, Jemielita, Thomas, Wan, Shuyan Sabrina, Hong, Chuan, Cai, Tianrun, Wen, Jun, Ayakulangara Panickan, Vidul, Liaw, Kai-Li, Liao, Katherine, Cai, Tianxi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: JMIR Publications 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10251230/
https://www.ncbi.nlm.nih.gov/pubmed/37227772
http://dx.doi.org/10.2196/45662
_version_ 1785055904259899392
author Hou, Jue
Zhao, Rachel
Gronsbell, Jessica
Lin, Yucong
Bonzel, Clara-Lea
Zeng, Qingyi
Zhang, Sinian
Beaulieu-Jones, Brett K
Weber, Griffin M
Jemielita, Thomas
Wan, Shuyan Sabrina
Hong, Chuan
Cai, Tianrun
Wen, Jun
Ayakulangara Panickan, Vidul
Liaw, Kai-Li
Liao, Katherine
Cai, Tianxi
author_facet Hou, Jue
Zhao, Rachel
Gronsbell, Jessica
Lin, Yucong
Bonzel, Clara-Lea
Zeng, Qingyi
Zhang, Sinian
Beaulieu-Jones, Brett K
Weber, Griffin M
Jemielita, Thomas
Wan, Shuyan Sabrina
Hong, Chuan
Cai, Tianrun
Wen, Jun
Ayakulangara Panickan, Vidul
Liaw, Kai-Li
Liao, Katherine
Cai, Tianxi
author_sort Hou, Jue
collection PubMed
description Although randomized controlled trials (RCTs) are the gold standard for establishing the efficacy and safety of a medical treatment, real-world evidence (RWE) generated from real-world data has been vital in postapproval monitoring and is being promoted for the regulatory process of experimental therapies. An emerging source of real-world data is electronic health records (EHRs), which contain detailed information on patient care in both structured (eg, diagnosis codes) and unstructured (eg, clinical notes and images) forms. Despite the granularity of the data available in EHRs, the critical variables required to reliably assess the relationship between a treatment and clinical outcome are challenging to extract. To address this fundamental challenge and accelerate the reliable use of EHRs for RWE, we introduce an integrated data curation and modeling pipeline consisting of 4 modules that leverage recent advances in natural language processing, computational phenotyping, and causal modeling techniques with noisy data. Module 1 consists of techniques for data harmonization. We use natural language processing to recognize clinical variables from RCT design documents and map the extracted variables to EHR features with description matching and knowledge networks. Module 2 then develops techniques for cohort construction using advanced phenotyping algorithms to both identify patients with diseases of interest and define the treatment arms. Module 3 introduces methods for variable curation, including a list of existing tools to extract baseline variables from different sources (eg, codified, free text, and medical imaging) and end points of various types (eg, death, binary, temporal, and numerical). Finally, module 4 presents validation and robust modeling methods, and we propose a strategy to create gold-standard labels for EHR variables of interest to validate data curation quality and perform subsequent causal modeling for RWE. In addition to the workflow proposed in our pipeline, we also develop a reporting guideline for RWE that covers the necessary information to facilitate transparent reporting and reproducibility of results. Moreover, our pipeline is highly data driven, enhancing study data with a rich variety of publicly available information and knowledge sources. We also showcase our pipeline and provide guidance on the deployment of relevant tools by revisiting the emulation of the Clinical Outcomes of Surgical Therapy Study Group Trial on laparoscopy-assisted colectomy versus open colectomy in patients with early-stage colon cancer. We also draw on existing literature on EHR emulation of RCTs together with our own studies with the Mass General Brigham EHR.
format Online
Article
Text
id pubmed-10251230
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher JMIR Publications
record_format MEDLINE/PubMed
spelling pubmed-102512302023-06-10 Generate Analysis-Ready Data for Real-world Evidence: Tutorial for Harnessing Electronic Health Records With Advanced Informatic Technologies Hou, Jue Zhao, Rachel Gronsbell, Jessica Lin, Yucong Bonzel, Clara-Lea Zeng, Qingyi Zhang, Sinian Beaulieu-Jones, Brett K Weber, Griffin M Jemielita, Thomas Wan, Shuyan Sabrina Hong, Chuan Cai, Tianrun Wen, Jun Ayakulangara Panickan, Vidul Liaw, Kai-Li Liao, Katherine Cai, Tianxi J Med Internet Res Tutorial Although randomized controlled trials (RCTs) are the gold standard for establishing the efficacy and safety of a medical treatment, real-world evidence (RWE) generated from real-world data has been vital in postapproval monitoring and is being promoted for the regulatory process of experimental therapies. An emerging source of real-world data is electronic health records (EHRs), which contain detailed information on patient care in both structured (eg, diagnosis codes) and unstructured (eg, clinical notes and images) forms. Despite the granularity of the data available in EHRs, the critical variables required to reliably assess the relationship between a treatment and clinical outcome are challenging to extract. To address this fundamental challenge and accelerate the reliable use of EHRs for RWE, we introduce an integrated data curation and modeling pipeline consisting of 4 modules that leverage recent advances in natural language processing, computational phenotyping, and causal modeling techniques with noisy data. Module 1 consists of techniques for data harmonization. We use natural language processing to recognize clinical variables from RCT design documents and map the extracted variables to EHR features with description matching and knowledge networks. Module 2 then develops techniques for cohort construction using advanced phenotyping algorithms to both identify patients with diseases of interest and define the treatment arms. Module 3 introduces methods for variable curation, including a list of existing tools to extract baseline variables from different sources (eg, codified, free text, and medical imaging) and end points of various types (eg, death, binary, temporal, and numerical). Finally, module 4 presents validation and robust modeling methods, and we propose a strategy to create gold-standard labels for EHR variables of interest to validate data curation quality and perform subsequent causal modeling for RWE. In addition to the workflow proposed in our pipeline, we also develop a reporting guideline for RWE that covers the necessary information to facilitate transparent reporting and reproducibility of results. Moreover, our pipeline is highly data driven, enhancing study data with a rich variety of publicly available information and knowledge sources. We also showcase our pipeline and provide guidance on the deployment of relevant tools by revisiting the emulation of the Clinical Outcomes of Surgical Therapy Study Group Trial on laparoscopy-assisted colectomy versus open colectomy in patients with early-stage colon cancer. We also draw on existing literature on EHR emulation of RCTs together with our own studies with the Mass General Brigham EHR. JMIR Publications 2023-05-25 /pmc/articles/PMC10251230/ /pubmed/37227772 http://dx.doi.org/10.2196/45662 Text en ©Jue Hou, Rachel Zhao, Jessica Gronsbell, Yucong Lin, Clara-Lea Bonzel, Qingyi Zeng, Sinian Zhang, Brett K Beaulieu-Jones, Griffin M Weber, Thomas Jemielita, Shuyan Sabrina Wan, Chuan Hong, Tianrun Cai, Jun Wen, Vidul Ayakulangara Panickan, Kai-Li Liaw, Katherine Liao, Tianxi Cai. Originally published in the Journal of Medical Internet Research (https://www.jmir.org), 25.05.2023. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in the Journal of Medical Internet Research, is properly cited. The complete bibliographic information, a link to the original publication on https://www.jmir.org/, as well as this copyright and license information must be included.
spellingShingle Tutorial
Hou, Jue
Zhao, Rachel
Gronsbell, Jessica
Lin, Yucong
Bonzel, Clara-Lea
Zeng, Qingyi
Zhang, Sinian
Beaulieu-Jones, Brett K
Weber, Griffin M
Jemielita, Thomas
Wan, Shuyan Sabrina
Hong, Chuan
Cai, Tianrun
Wen, Jun
Ayakulangara Panickan, Vidul
Liaw, Kai-Li
Liao, Katherine
Cai, Tianxi
Generate Analysis-Ready Data for Real-world Evidence: Tutorial for Harnessing Electronic Health Records With Advanced Informatic Technologies
title Generate Analysis-Ready Data for Real-world Evidence: Tutorial for Harnessing Electronic Health Records With Advanced Informatic Technologies
title_full Generate Analysis-Ready Data for Real-world Evidence: Tutorial for Harnessing Electronic Health Records With Advanced Informatic Technologies
title_fullStr Generate Analysis-Ready Data for Real-world Evidence: Tutorial for Harnessing Electronic Health Records With Advanced Informatic Technologies
title_full_unstemmed Generate Analysis-Ready Data for Real-world Evidence: Tutorial for Harnessing Electronic Health Records With Advanced Informatic Technologies
title_short Generate Analysis-Ready Data for Real-world Evidence: Tutorial for Harnessing Electronic Health Records With Advanced Informatic Technologies
title_sort generate analysis-ready data for real-world evidence: tutorial for harnessing electronic health records with advanced informatic technologies
topic Tutorial
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10251230/
https://www.ncbi.nlm.nih.gov/pubmed/37227772
http://dx.doi.org/10.2196/45662
work_keys_str_mv AT houjue generateanalysisreadydataforrealworldevidencetutorialforharnessingelectronichealthrecordswithadvancedinformatictechnologies
AT zhaorachel generateanalysisreadydataforrealworldevidencetutorialforharnessingelectronichealthrecordswithadvancedinformatictechnologies
AT gronsbelljessica generateanalysisreadydataforrealworldevidencetutorialforharnessingelectronichealthrecordswithadvancedinformatictechnologies
AT linyucong generateanalysisreadydataforrealworldevidencetutorialforharnessingelectronichealthrecordswithadvancedinformatictechnologies
AT bonzelclaralea generateanalysisreadydataforrealworldevidencetutorialforharnessingelectronichealthrecordswithadvancedinformatictechnologies
AT zengqingyi generateanalysisreadydataforrealworldevidencetutorialforharnessingelectronichealthrecordswithadvancedinformatictechnologies
AT zhangsinian generateanalysisreadydataforrealworldevidencetutorialforharnessingelectronichealthrecordswithadvancedinformatictechnologies
AT beaulieujonesbrettk generateanalysisreadydataforrealworldevidencetutorialforharnessingelectronichealthrecordswithadvancedinformatictechnologies
AT webergriffinm generateanalysisreadydataforrealworldevidencetutorialforharnessingelectronichealthrecordswithadvancedinformatictechnologies
AT jemielitathomas generateanalysisreadydataforrealworldevidencetutorialforharnessingelectronichealthrecordswithadvancedinformatictechnologies
AT wanshuyansabrina generateanalysisreadydataforrealworldevidencetutorialforharnessingelectronichealthrecordswithadvancedinformatictechnologies
AT hongchuan generateanalysisreadydataforrealworldevidencetutorialforharnessingelectronichealthrecordswithadvancedinformatictechnologies
AT caitianrun generateanalysisreadydataforrealworldevidencetutorialforharnessingelectronichealthrecordswithadvancedinformatictechnologies
AT wenjun generateanalysisreadydataforrealworldevidencetutorialforharnessingelectronichealthrecordswithadvancedinformatictechnologies
AT ayakulangarapanickanvidul generateanalysisreadydataforrealworldevidencetutorialforharnessingelectronichealthrecordswithadvancedinformatictechnologies
AT liawkaili generateanalysisreadydataforrealworldevidencetutorialforharnessingelectronichealthrecordswithadvancedinformatictechnologies
AT liaokatherine generateanalysisreadydataforrealworldevidencetutorialforharnessingelectronichealthrecordswithadvancedinformatictechnologies
AT caitianxi generateanalysisreadydataforrealworldevidencetutorialforharnessingelectronichealthrecordswithadvancedinformatictechnologies