Cargando…

Prevalence of Missing Data in the National Cancer Database and Association With Overall Survival

IMPORTANCE: Cancer registries are important real-world data sources consisting of data abstraction from the medical record; however, patients with unknown or missing data are underrepresented in studies that use such data sources. OBJECTIVE: To assess the prevalence of missing data and its associati...

Descripción completa

Detalles Bibliográficos
Autores principales: Yang, Daniel X., Khera, Rohan, Miccio, Joseph A., Jairam, Vikram, Chang, Enoch, Yu, James B., Park, Henry S., Krumholz, Harlan M., Aneja, Sanjay
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Medical Association 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7988369/
https://www.ncbi.nlm.nih.gov/pubmed/33755165
http://dx.doi.org/10.1001/jamanetworkopen.2021.1793
_version_ 1783668777011904512
author Yang, Daniel X.
Khera, Rohan
Miccio, Joseph A.
Jairam, Vikram
Chang, Enoch
Yu, James B.
Park, Henry S.
Krumholz, Harlan M.
Aneja, Sanjay
author_facet Yang, Daniel X.
Khera, Rohan
Miccio, Joseph A.
Jairam, Vikram
Chang, Enoch
Yu, James B.
Park, Henry S.
Krumholz, Harlan M.
Aneja, Sanjay
author_sort Yang, Daniel X.
collection PubMed
description IMPORTANCE: Cancer registries are important real-world data sources consisting of data abstraction from the medical record; however, patients with unknown or missing data are underrepresented in studies that use such data sources. OBJECTIVE: To assess the prevalence of missing data and its association with overall survival among patients with cancer. DESIGN, SETTING, AND PARTICIPANTS: In this retrospective cohort study, all variables within the National Cancer Database were reviewed for missing or unknown values for patients with the 3 most common cancers in the US who received diagnoses from January 1, 2006, to December 31, 2015. The prevalence of patient records with missing data and the association with overall survival were assessed. Data analysis was performed from February to August 2020. EXPOSURES: Any missing data field within a patient record among 63 variables of interest from more than 130 total variables in the National Cancer Database. MAIN OUTCOMES AND MEASURES: Prevalence of missing data in the medical records of patients with cancer and associated 2-year overall survival. RESULTS: A total of 1 198 749 patients with non–small cell lung cancer (mean [SD] age, 68.5 [10.9] years; 628 811 men [52.5%]), 2 120 775 patients with breast cancer (mean [SD] age, 61.0 [13.3] years; 2 101 758 women [99.1%]), and 1 158 635 patients with prostate cancer (mean [SD] age, 65.2 [9.0] years; 100% men) were included in the analysis. Among those with non–small cell lung cancer, 851 295 patients (71.0%) were missing data for variables of interest; 2-year overall survival was 33.2% for patients with missing data and 51.6% for patients with complete data (P < .001). Among those with breast cancer, 1 161 096 patients (54.7%) were missing data for variables of interest; 2-year overall survival was 93.2% for patients with missing data and 93.9% for patients with complete data (P < .001). Among those with prostate cancer, 460 167 patients (39.7%) were missing data for variables of interest; 2-year overall survival was 91.0% for patients with missing data and 95.6% for patients with complete data (P < .001). CONCLUSIONS AND RELEVANCE: This study found that within a large cancer registry–based real-world data source, there was a high prevalence of missing data that were unable to be ascertained from the medical record. The prevalence of missing data among patients with cancer was associated with heterogeneous differences in overall survival. Improvements in documentation and data quality are necessary to make optimal use of real-world data for clinical advancements.
format Online
Article
Text
id pubmed-7988369
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher American Medical Association
record_format MEDLINE/PubMed
spelling pubmed-79883692021-04-12 Prevalence of Missing Data in the National Cancer Database and Association With Overall Survival Yang, Daniel X. Khera, Rohan Miccio, Joseph A. Jairam, Vikram Chang, Enoch Yu, James B. Park, Henry S. Krumholz, Harlan M. Aneja, Sanjay JAMA Netw Open Original Investigation IMPORTANCE: Cancer registries are important real-world data sources consisting of data abstraction from the medical record; however, patients with unknown or missing data are underrepresented in studies that use such data sources. OBJECTIVE: To assess the prevalence of missing data and its association with overall survival among patients with cancer. DESIGN, SETTING, AND PARTICIPANTS: In this retrospective cohort study, all variables within the National Cancer Database were reviewed for missing or unknown values for patients with the 3 most common cancers in the US who received diagnoses from January 1, 2006, to December 31, 2015. The prevalence of patient records with missing data and the association with overall survival were assessed. Data analysis was performed from February to August 2020. EXPOSURES: Any missing data field within a patient record among 63 variables of interest from more than 130 total variables in the National Cancer Database. MAIN OUTCOMES AND MEASURES: Prevalence of missing data in the medical records of patients with cancer and associated 2-year overall survival. RESULTS: A total of 1 198 749 patients with non–small cell lung cancer (mean [SD] age, 68.5 [10.9] years; 628 811 men [52.5%]), 2 120 775 patients with breast cancer (mean [SD] age, 61.0 [13.3] years; 2 101 758 women [99.1%]), and 1 158 635 patients with prostate cancer (mean [SD] age, 65.2 [9.0] years; 100% men) were included in the analysis. Among those with non–small cell lung cancer, 851 295 patients (71.0%) were missing data for variables of interest; 2-year overall survival was 33.2% for patients with missing data and 51.6% for patients with complete data (P < .001). Among those with breast cancer, 1 161 096 patients (54.7%) were missing data for variables of interest; 2-year overall survival was 93.2% for patients with missing data and 93.9% for patients with complete data (P < .001). Among those with prostate cancer, 460 167 patients (39.7%) were missing data for variables of interest; 2-year overall survival was 91.0% for patients with missing data and 95.6% for patients with complete data (P < .001). CONCLUSIONS AND RELEVANCE: This study found that within a large cancer registry–based real-world data source, there was a high prevalence of missing data that were unable to be ascertained from the medical record. The prevalence of missing data among patients with cancer was associated with heterogeneous differences in overall survival. Improvements in documentation and data quality are necessary to make optimal use of real-world data for clinical advancements. American Medical Association 2021-03-23 /pmc/articles/PMC7988369/ /pubmed/33755165 http://dx.doi.org/10.1001/jamanetworkopen.2021.1793 Text en Copyright 2021 Yang DX et al. JAMA Network Open. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the CC-BY License.
spellingShingle Original Investigation
Yang, Daniel X.
Khera, Rohan
Miccio, Joseph A.
Jairam, Vikram
Chang, Enoch
Yu, James B.
Park, Henry S.
Krumholz, Harlan M.
Aneja, Sanjay
Prevalence of Missing Data in the National Cancer Database and Association With Overall Survival
title Prevalence of Missing Data in the National Cancer Database and Association With Overall Survival
title_full Prevalence of Missing Data in the National Cancer Database and Association With Overall Survival
title_fullStr Prevalence of Missing Data in the National Cancer Database and Association With Overall Survival
title_full_unstemmed Prevalence of Missing Data in the National Cancer Database and Association With Overall Survival
title_short Prevalence of Missing Data in the National Cancer Database and Association With Overall Survival
title_sort prevalence of missing data in the national cancer database and association with overall survival
topic Original Investigation
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7988369/
https://www.ncbi.nlm.nih.gov/pubmed/33755165
http://dx.doi.org/10.1001/jamanetworkopen.2021.1793
work_keys_str_mv AT yangdanielx prevalenceofmissingdatainthenationalcancerdatabaseandassociationwithoverallsurvival
AT kherarohan prevalenceofmissingdatainthenationalcancerdatabaseandassociationwithoverallsurvival
AT micciojosepha prevalenceofmissingdatainthenationalcancerdatabaseandassociationwithoverallsurvival
AT jairamvikram prevalenceofmissingdatainthenationalcancerdatabaseandassociationwithoverallsurvival
AT changenoch prevalenceofmissingdatainthenationalcancerdatabaseandassociationwithoverallsurvival
AT yujamesb prevalenceofmissingdatainthenationalcancerdatabaseandassociationwithoverallsurvival
AT parkhenrys prevalenceofmissingdatainthenationalcancerdatabaseandassociationwithoverallsurvival
AT krumholzharlanm prevalenceofmissingdatainthenationalcancerdatabaseandassociationwithoverallsurvival
AT anejasanjay prevalenceofmissingdatainthenationalcancerdatabaseandassociationwithoverallsurvival