Cargando…

Multiple Imputation of Missing Race and Ethnicity in CDC COVID-19 Case-Level Surveillance Data

The COVID-19 pandemic has resulted in a disproportionate burden on racial and ethnic minority groups, but incompleteness in surveillance data limits understanding of disparities. CDC’s case-based surveillance system contains case-level information on most COVID-19 cases in the United States. Data an...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Guangyu, Rose, Charles E., Zhang, Yujia, Li, Rui, Lee, Florence C., Massetti, Greta, Adams, Laura E.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8967240/
https://www.ncbi.nlm.nih.gov/pubmed/35368775
http://dx.doi.org/10.6000/1929-6029.2022.11.01
_version_ 1784678798584709120
author Zhang, Guangyu
Rose, Charles E.
Zhang, Yujia
Li, Rui
Lee, Florence C.
Massetti, Greta
Adams, Laura E.
author_facet Zhang, Guangyu
Rose, Charles E.
Zhang, Yujia
Li, Rui
Lee, Florence C.
Massetti, Greta
Adams, Laura E.
author_sort Zhang, Guangyu
collection PubMed
description The COVID-19 pandemic has resulted in a disproportionate burden on racial and ethnic minority groups, but incompleteness in surveillance data limits understanding of disparities. CDC’s case-based surveillance system contains case-level information on most COVID-19 cases in the United States. Data analyzed in this paper contain COVID-19 cases with case-level information through September 25, 2020, which represent 70.9% of all COVID-19 cases reported to CDC during the period. Case-level surveillance data are used to investigate COVID-19 disparities by race/ethnicity, sex, and age. However, demographic information on race and ethnicity is missing for a substantial percentage of COVID-19 cases (e.g., 35.8% and 47.2% of cases analyzed were missing race and ethnicity information, respectively). Our goal in this study was to impute missing race and ethnicity to derive more accurate incidence and incidence rate ratio (IRR) estimates for different racial and ethnic groups, and evaluate the results from imputation compared to complete case analysis, which involves removing cases with missing race/ethnicity information from the analysis. Two multiple imputation (MI) models were developed. Model 1 imputes race using six binary race variables, and Model 2 imputes race as a composite multinomial variable. Our evaluation found that compared with complete case analysis, MI reduced biases and improved coverage on incidence and IRR estimates for all race/ethnicity groups, except for the Non-Hispanic Multiple/other group. Our research highlights the importance of supplementing complete case analysis with additional methods of analysis to better describe racial and ethnic disparities. When race and ethnicity data are missing, multiple imputation may provide more accurate incidence and IRR estimates to monitor these disparities in tandem with efforts to improve the collection of race and ethnicity information for pandemic surveillance.
format Online
Article
Text
id pubmed-8967240
institution National Center for Biotechnology Information
language English
publishDate 2022
record_format MEDLINE/PubMed
spelling pubmed-89672402023-01-28 Multiple Imputation of Missing Race and Ethnicity in CDC COVID-19 Case-Level Surveillance Data Zhang, Guangyu Rose, Charles E. Zhang, Yujia Li, Rui Lee, Florence C. Massetti, Greta Adams, Laura E. Int J Stat Med Res Article The COVID-19 pandemic has resulted in a disproportionate burden on racial and ethnic minority groups, but incompleteness in surveillance data limits understanding of disparities. CDC’s case-based surveillance system contains case-level information on most COVID-19 cases in the United States. Data analyzed in this paper contain COVID-19 cases with case-level information through September 25, 2020, which represent 70.9% of all COVID-19 cases reported to CDC during the period. Case-level surveillance data are used to investigate COVID-19 disparities by race/ethnicity, sex, and age. However, demographic information on race and ethnicity is missing for a substantial percentage of COVID-19 cases (e.g., 35.8% and 47.2% of cases analyzed were missing race and ethnicity information, respectively). Our goal in this study was to impute missing race and ethnicity to derive more accurate incidence and incidence rate ratio (IRR) estimates for different racial and ethnic groups, and evaluate the results from imputation compared to complete case analysis, which involves removing cases with missing race/ethnicity information from the analysis. Two multiple imputation (MI) models were developed. Model 1 imputes race using six binary race variables, and Model 2 imputes race as a composite multinomial variable. Our evaluation found that compared with complete case analysis, MI reduced biases and improved coverage on incidence and IRR estimates for all race/ethnicity groups, except for the Non-Hispanic Multiple/other group. Our research highlights the importance of supplementing complete case analysis with additional methods of analysis to better describe racial and ethnic disparities. When race and ethnicity data are missing, multiple imputation may provide more accurate incidence and IRR estimates to monitor these disparities in tandem with efforts to improve the collection of race and ethnicity information for pandemic surveillance. 2022-01-28 /pmc/articles/PMC8967240/ /pubmed/35368775 http://dx.doi.org/10.6000/1929-6029.2022.11.01 Text en https://creativecommons.org/licenses/by/4.0/This is an open access article licensed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ) which permits unrestricted use, distribution and reproduction in any medium, provided the work is properly cited.
spellingShingle Article
Zhang, Guangyu
Rose, Charles E.
Zhang, Yujia
Li, Rui
Lee, Florence C.
Massetti, Greta
Adams, Laura E.
Multiple Imputation of Missing Race and Ethnicity in CDC COVID-19 Case-Level Surveillance Data
title Multiple Imputation of Missing Race and Ethnicity in CDC COVID-19 Case-Level Surveillance Data
title_full Multiple Imputation of Missing Race and Ethnicity in CDC COVID-19 Case-Level Surveillance Data
title_fullStr Multiple Imputation of Missing Race and Ethnicity in CDC COVID-19 Case-Level Surveillance Data
title_full_unstemmed Multiple Imputation of Missing Race and Ethnicity in CDC COVID-19 Case-Level Surveillance Data
title_short Multiple Imputation of Missing Race and Ethnicity in CDC COVID-19 Case-Level Surveillance Data
title_sort multiple imputation of missing race and ethnicity in cdc covid-19 case-level surveillance data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8967240/
https://www.ncbi.nlm.nih.gov/pubmed/35368775
http://dx.doi.org/10.6000/1929-6029.2022.11.01
work_keys_str_mv AT zhangguangyu multipleimputationofmissingraceandethnicityincdccovid19caselevelsurveillancedata
AT rosecharlese multipleimputationofmissingraceandethnicityincdccovid19caselevelsurveillancedata
AT zhangyujia multipleimputationofmissingraceandethnicityincdccovid19caselevelsurveillancedata
AT lirui multipleimputationofmissingraceandethnicityincdccovid19caselevelsurveillancedata
AT leeflorencec multipleimputationofmissingraceandethnicityincdccovid19caselevelsurveillancedata
AT massettigreta multipleimputationofmissingraceandethnicityincdccovid19caselevelsurveillancedata
AT adamslaurae multipleimputationofmissingraceandethnicityincdccovid19caselevelsurveillancedata