Cargando…
Data quality considerations for evaluating COVID-19 treatments using real world data: learnings from the National COVID Cohort Collaborative (N3C)
BACKGROUND: Multi-institution electronic health records (EHR) are a rich source of real world data (RWD) for generating real world evidence (RWE) regarding the utilization, benefits and harms of medical interventions. They provide access to clinical data from large pooled patient populations in addi...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9936475/ https://www.ncbi.nlm.nih.gov/pubmed/36800930 http://dx.doi.org/10.1186/s12874-023-01839-2 |
_version_ | 1784890238000168960 |
---|---|
author | Sidky, Hythem Young, Jessica C. Girvin, Andrew T. Lee, Eileen Shao, Yu Raymond Hotaling, Nathan Michael, Sam Wilkins, Kenneth J. Setoguchi, Soko Funk, Michele Jonsson |
author_facet | Sidky, Hythem Young, Jessica C. Girvin, Andrew T. Lee, Eileen Shao, Yu Raymond Hotaling, Nathan Michael, Sam Wilkins, Kenneth J. Setoguchi, Soko Funk, Michele Jonsson |
author_sort | Sidky, Hythem |
collection | PubMed |
description | BACKGROUND: Multi-institution electronic health records (EHR) are a rich source of real world data (RWD) for generating real world evidence (RWE) regarding the utilization, benefits and harms of medical interventions. They provide access to clinical data from large pooled patient populations in addition to laboratory measurements unavailable in insurance claims-based data. However, secondary use of these data for research requires specialized knowledge and careful evaluation of data quality and completeness. We discuss data quality assessments undertaken during the conduct of prep-to-research, focusing on the investigation of treatment safety and effectiveness. METHODS: Using the National COVID Cohort Collaborative (N3C) enclave, we defined a patient population using criteria typical in non-interventional inpatient drug effectiveness studies. We present the challenges encountered when constructing this dataset, beginning with an examination of data quality across data partners. We then discuss the methods and best practices used to operationalize several important study elements: exposure to treatment, baseline health comorbidities, and key outcomes of interest. RESULTS: We share our experiences and lessons learned when working with heterogeneous EHR data from over 65 healthcare institutions and 4 common data models. We discuss six key areas of data variability and quality. (1) The specific EHR data elements captured from a site can vary depending on source data model and practice. (2) Data missingness remains a significant issue. (3) Drug exposures can be recorded at different levels and may not contain route of administration or dosage information. (4) Reconstruction of continuous drug exposure intervals may not always be possible. (5) EHR discontinuity is a major concern for capturing history of prior treatment and comorbidities. Lastly, (6) access to EHR data alone limits the potential outcomes which can be used in studies. CONCLUSIONS: The creation of large scale centralized multi-site EHR databases such as N3C enables a wide range of research aimed at better understanding treatments and health impacts of many conditions including COVID-19. As with all observational research, it is important that research teams engage with appropriate domain experts to understand the data in order to define research questions that are both clinically important and feasible to address using these real world data. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12874-023-01839-2. |
format | Online Article Text |
id | pubmed-9936475 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-99364752023-02-17 Data quality considerations for evaluating COVID-19 treatments using real world data: learnings from the National COVID Cohort Collaborative (N3C) Sidky, Hythem Young, Jessica C. Girvin, Andrew T. Lee, Eileen Shao, Yu Raymond Hotaling, Nathan Michael, Sam Wilkins, Kenneth J. Setoguchi, Soko Funk, Michele Jonsson BMC Med Res Methodol Research BACKGROUND: Multi-institution electronic health records (EHR) are a rich source of real world data (RWD) for generating real world evidence (RWE) regarding the utilization, benefits and harms of medical interventions. They provide access to clinical data from large pooled patient populations in addition to laboratory measurements unavailable in insurance claims-based data. However, secondary use of these data for research requires specialized knowledge and careful evaluation of data quality and completeness. We discuss data quality assessments undertaken during the conduct of prep-to-research, focusing on the investigation of treatment safety and effectiveness. METHODS: Using the National COVID Cohort Collaborative (N3C) enclave, we defined a patient population using criteria typical in non-interventional inpatient drug effectiveness studies. We present the challenges encountered when constructing this dataset, beginning with an examination of data quality across data partners. We then discuss the methods and best practices used to operationalize several important study elements: exposure to treatment, baseline health comorbidities, and key outcomes of interest. RESULTS: We share our experiences and lessons learned when working with heterogeneous EHR data from over 65 healthcare institutions and 4 common data models. We discuss six key areas of data variability and quality. (1) The specific EHR data elements captured from a site can vary depending on source data model and practice. (2) Data missingness remains a significant issue. (3) Drug exposures can be recorded at different levels and may not contain route of administration or dosage information. (4) Reconstruction of continuous drug exposure intervals may not always be possible. (5) EHR discontinuity is a major concern for capturing history of prior treatment and comorbidities. Lastly, (6) access to EHR data alone limits the potential outcomes which can be used in studies. CONCLUSIONS: The creation of large scale centralized multi-site EHR databases such as N3C enables a wide range of research aimed at better understanding treatments and health impacts of many conditions including COVID-19. As with all observational research, it is important that research teams engage with appropriate domain experts to understand the data in order to define research questions that are both clinically important and feasible to address using these real world data. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12874-023-01839-2. BioMed Central 2023-02-17 /pmc/articles/PMC9936475/ /pubmed/36800930 http://dx.doi.org/10.1186/s12874-023-01839-2 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Sidky, Hythem Young, Jessica C. Girvin, Andrew T. Lee, Eileen Shao, Yu Raymond Hotaling, Nathan Michael, Sam Wilkins, Kenneth J. Setoguchi, Soko Funk, Michele Jonsson Data quality considerations for evaluating COVID-19 treatments using real world data: learnings from the National COVID Cohort Collaborative (N3C) |
title | Data quality considerations for evaluating COVID-19 treatments using real world data: learnings from the National COVID Cohort Collaborative (N3C) |
title_full | Data quality considerations for evaluating COVID-19 treatments using real world data: learnings from the National COVID Cohort Collaborative (N3C) |
title_fullStr | Data quality considerations for evaluating COVID-19 treatments using real world data: learnings from the National COVID Cohort Collaborative (N3C) |
title_full_unstemmed | Data quality considerations for evaluating COVID-19 treatments using real world data: learnings from the National COVID Cohort Collaborative (N3C) |
title_short | Data quality considerations for evaluating COVID-19 treatments using real world data: learnings from the National COVID Cohort Collaborative (N3C) |
title_sort | data quality considerations for evaluating covid-19 treatments using real world data: learnings from the national covid cohort collaborative (n3c) |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9936475/ https://www.ncbi.nlm.nih.gov/pubmed/36800930 http://dx.doi.org/10.1186/s12874-023-01839-2 |
work_keys_str_mv | AT sidkyhythem dataqualityconsiderationsforevaluatingcovid19treatmentsusingrealworlddatalearningsfromthenationalcovidcohortcollaborativen3c AT youngjessicac dataqualityconsiderationsforevaluatingcovid19treatmentsusingrealworlddatalearningsfromthenationalcovidcohortcollaborativen3c AT girvinandrewt dataqualityconsiderationsforevaluatingcovid19treatmentsusingrealworlddatalearningsfromthenationalcovidcohortcollaborativen3c AT leeeileen dataqualityconsiderationsforevaluatingcovid19treatmentsusingrealworlddatalearningsfromthenationalcovidcohortcollaborativen3c AT shaoyuraymond dataqualityconsiderationsforevaluatingcovid19treatmentsusingrealworlddatalearningsfromthenationalcovidcohortcollaborativen3c AT hotalingnathan dataqualityconsiderationsforevaluatingcovid19treatmentsusingrealworlddatalearningsfromthenationalcovidcohortcollaborativen3c AT michaelsam dataqualityconsiderationsforevaluatingcovid19treatmentsusingrealworlddatalearningsfromthenationalcovidcohortcollaborativen3c AT wilkinskennethj dataqualityconsiderationsforevaluatingcovid19treatmentsusingrealworlddatalearningsfromthenationalcovidcohortcollaborativen3c AT setoguchisoko dataqualityconsiderationsforevaluatingcovid19treatmentsusingrealworlddatalearningsfromthenationalcovidcohortcollaborativen3c AT funkmichelejonsson dataqualityconsiderationsforevaluatingcovid19treatmentsusingrealworlddatalearningsfromthenationalcovidcohortcollaborativen3c AT dataqualityconsiderationsforevaluatingcovid19treatmentsusingrealworlddatalearningsfromthenationalcovidcohortcollaborativen3c |