Cargando…

Data quality considerations for evaluating COVID-19 treatments using real world data: learnings from the National COVID Cohort Collaborative (N3C)

BACKGROUND: Multi-institution electronic health records (EHR) are a rich source of real world data (RWD) for generating real world evidence (RWE) regarding the utilization, benefits and harms of medical interventions. They provide access to clinical data from large pooled patient populations in addi...

Descripción completa

Detalles Bibliográficos
Autores principales: Sidky, Hythem, Young, Jessica C., Girvin, Andrew T., Lee, Eileen, Shao, Yu Raymond, Hotaling, Nathan, Michael, Sam, Wilkins, Kenneth J., Setoguchi, Soko, Funk, Michele Jonsson
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9936475/
https://www.ncbi.nlm.nih.gov/pubmed/36800930
http://dx.doi.org/10.1186/s12874-023-01839-2
_version_ 1784890238000168960
author Sidky, Hythem
Young, Jessica C.
Girvin, Andrew T.
Lee, Eileen
Shao, Yu Raymond
Hotaling, Nathan
Michael, Sam
Wilkins, Kenneth J.
Setoguchi, Soko
Funk, Michele Jonsson
author_facet Sidky, Hythem
Young, Jessica C.
Girvin, Andrew T.
Lee, Eileen
Shao, Yu Raymond
Hotaling, Nathan
Michael, Sam
Wilkins, Kenneth J.
Setoguchi, Soko
Funk, Michele Jonsson
author_sort Sidky, Hythem
collection PubMed
description BACKGROUND: Multi-institution electronic health records (EHR) are a rich source of real world data (RWD) for generating real world evidence (RWE) regarding the utilization, benefits and harms of medical interventions. They provide access to clinical data from large pooled patient populations in addition to laboratory measurements unavailable in insurance claims-based data. However, secondary use of these data for research requires specialized knowledge and careful evaluation of data quality and completeness. We discuss data quality assessments undertaken during the conduct of prep-to-research, focusing on the investigation of treatment safety and effectiveness. METHODS: Using the National COVID Cohort Collaborative (N3C) enclave, we defined a patient population using criteria typical in non-interventional inpatient drug effectiveness studies. We present the challenges encountered when constructing this dataset, beginning with an examination of data quality across data partners. We then discuss the methods and best practices used to operationalize several important study elements: exposure to treatment, baseline health comorbidities, and key outcomes of interest. RESULTS: We share our experiences and lessons learned when working with heterogeneous EHR data from over 65 healthcare institutions and 4 common data models. We discuss six key areas of data variability and quality. (1) The specific EHR data elements captured from a site can vary depending on source data model and practice. (2) Data missingness remains a significant issue. (3) Drug exposures can be recorded at different levels and may not contain route of administration or dosage information. (4) Reconstruction of continuous drug exposure intervals may not always be possible. (5) EHR discontinuity is a major concern for capturing history of prior treatment and comorbidities. Lastly, (6) access to EHR data alone limits the potential outcomes which can be used in studies. CONCLUSIONS: The creation of large scale centralized multi-site EHR databases such as N3C enables a wide range of research aimed at better understanding treatments and health impacts of many conditions including COVID-19. As with all observational research, it is important that research teams engage with appropriate domain experts to understand the data in order to define research questions that are both clinically important and feasible to address using these real world data. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12874-023-01839-2.
format Online
Article
Text
id pubmed-9936475
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-99364752023-02-17 Data quality considerations for evaluating COVID-19 treatments using real world data: learnings from the National COVID Cohort Collaborative (N3C) Sidky, Hythem Young, Jessica C. Girvin, Andrew T. Lee, Eileen Shao, Yu Raymond Hotaling, Nathan Michael, Sam Wilkins, Kenneth J. Setoguchi, Soko Funk, Michele Jonsson BMC Med Res Methodol Research BACKGROUND: Multi-institution electronic health records (EHR) are a rich source of real world data (RWD) for generating real world evidence (RWE) regarding the utilization, benefits and harms of medical interventions. They provide access to clinical data from large pooled patient populations in addition to laboratory measurements unavailable in insurance claims-based data. However, secondary use of these data for research requires specialized knowledge and careful evaluation of data quality and completeness. We discuss data quality assessments undertaken during the conduct of prep-to-research, focusing on the investigation of treatment safety and effectiveness. METHODS: Using the National COVID Cohort Collaborative (N3C) enclave, we defined a patient population using criteria typical in non-interventional inpatient drug effectiveness studies. We present the challenges encountered when constructing this dataset, beginning with an examination of data quality across data partners. We then discuss the methods and best practices used to operationalize several important study elements: exposure to treatment, baseline health comorbidities, and key outcomes of interest. RESULTS: We share our experiences and lessons learned when working with heterogeneous EHR data from over 65 healthcare institutions and 4 common data models. We discuss six key areas of data variability and quality. (1) The specific EHR data elements captured from a site can vary depending on source data model and practice. (2) Data missingness remains a significant issue. (3) Drug exposures can be recorded at different levels and may not contain route of administration or dosage information. (4) Reconstruction of continuous drug exposure intervals may not always be possible. (5) EHR discontinuity is a major concern for capturing history of prior treatment and comorbidities. Lastly, (6) access to EHR data alone limits the potential outcomes which can be used in studies. CONCLUSIONS: The creation of large scale centralized multi-site EHR databases such as N3C enables a wide range of research aimed at better understanding treatments and health impacts of many conditions including COVID-19. As with all observational research, it is important that research teams engage with appropriate domain experts to understand the data in order to define research questions that are both clinically important and feasible to address using these real world data. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12874-023-01839-2. BioMed Central 2023-02-17 /pmc/articles/PMC9936475/ /pubmed/36800930 http://dx.doi.org/10.1186/s12874-023-01839-2 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Sidky, Hythem
Young, Jessica C.
Girvin, Andrew T.
Lee, Eileen
Shao, Yu Raymond
Hotaling, Nathan
Michael, Sam
Wilkins, Kenneth J.
Setoguchi, Soko
Funk, Michele Jonsson
Data quality considerations for evaluating COVID-19 treatments using real world data: learnings from the National COVID Cohort Collaborative (N3C)
title Data quality considerations for evaluating COVID-19 treatments using real world data: learnings from the National COVID Cohort Collaborative (N3C)
title_full Data quality considerations for evaluating COVID-19 treatments using real world data: learnings from the National COVID Cohort Collaborative (N3C)
title_fullStr Data quality considerations for evaluating COVID-19 treatments using real world data: learnings from the National COVID Cohort Collaborative (N3C)
title_full_unstemmed Data quality considerations for evaluating COVID-19 treatments using real world data: learnings from the National COVID Cohort Collaborative (N3C)
title_short Data quality considerations for evaluating COVID-19 treatments using real world data: learnings from the National COVID Cohort Collaborative (N3C)
title_sort data quality considerations for evaluating covid-19 treatments using real world data: learnings from the national covid cohort collaborative (n3c)
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9936475/
https://www.ncbi.nlm.nih.gov/pubmed/36800930
http://dx.doi.org/10.1186/s12874-023-01839-2
work_keys_str_mv AT sidkyhythem dataqualityconsiderationsforevaluatingcovid19treatmentsusingrealworlddatalearningsfromthenationalcovidcohortcollaborativen3c
AT youngjessicac dataqualityconsiderationsforevaluatingcovid19treatmentsusingrealworlddatalearningsfromthenationalcovidcohortcollaborativen3c
AT girvinandrewt dataqualityconsiderationsforevaluatingcovid19treatmentsusingrealworlddatalearningsfromthenationalcovidcohortcollaborativen3c
AT leeeileen dataqualityconsiderationsforevaluatingcovid19treatmentsusingrealworlddatalearningsfromthenationalcovidcohortcollaborativen3c
AT shaoyuraymond dataqualityconsiderationsforevaluatingcovid19treatmentsusingrealworlddatalearningsfromthenationalcovidcohortcollaborativen3c
AT hotalingnathan dataqualityconsiderationsforevaluatingcovid19treatmentsusingrealworlddatalearningsfromthenationalcovidcohortcollaborativen3c
AT michaelsam dataqualityconsiderationsforevaluatingcovid19treatmentsusingrealworlddatalearningsfromthenationalcovidcohortcollaborativen3c
AT wilkinskennethj dataqualityconsiderationsforevaluatingcovid19treatmentsusingrealworlddatalearningsfromthenationalcovidcohortcollaborativen3c
AT setoguchisoko dataqualityconsiderationsforevaluatingcovid19treatmentsusingrealworlddatalearningsfromthenationalcovidcohortcollaborativen3c
AT funkmichelejonsson dataqualityconsiderationsforevaluatingcovid19treatmentsusingrealworlddatalearningsfromthenationalcovidcohortcollaborativen3c
AT dataqualityconsiderationsforevaluatingcovid19treatmentsusingrealworlddatalearningsfromthenationalcovidcohortcollaborativen3c