Cargando…

Importance of missingness in baseline variables: A case study of the All of Us Research Program

OBJECTIVE: The All of Us Research Program collects data from multiple information sources, including health surveys, to build a national longitudinal research repository that researchers can use to advance precision medicine. Missing survey responses pose challenges to study conclusions. We describe...

Descripción completa

Detalles Bibliográficos
Autores principales: Cronin, Robert M., Feng, Xiaoke, Sulieman, Lina, Mapes, Brandy, Garbett, Shawn, Able, Ashley, Hale, Ryan, Couper, Mick P., Sansbury, Heather, Ahmedani, Brian K., Chen, Qingxia
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10194909/
https://www.ncbi.nlm.nih.gov/pubmed/37200348
http://dx.doi.org/10.1371/journal.pone.0285848
_version_ 1785044115916849152
author Cronin, Robert M.
Feng, Xiaoke
Sulieman, Lina
Mapes, Brandy
Garbett, Shawn
Able, Ashley
Hale, Ryan
Couper, Mick P.
Sansbury, Heather
Ahmedani, Brian K.
Chen, Qingxia
author_facet Cronin, Robert M.
Feng, Xiaoke
Sulieman, Lina
Mapes, Brandy
Garbett, Shawn
Able, Ashley
Hale, Ryan
Couper, Mick P.
Sansbury, Heather
Ahmedani, Brian K.
Chen, Qingxia
author_sort Cronin, Robert M.
collection PubMed
description OBJECTIVE: The All of Us Research Program collects data from multiple information sources, including health surveys, to build a national longitudinal research repository that researchers can use to advance precision medicine. Missing survey responses pose challenges to study conclusions. We describe missingness in All of Us baseline surveys. STUDY DESIGN AND SETTING: We extracted survey responses between May 31, 2017, to September 30, 2020. Missing percentages for groups historically underrepresented in biomedical research were compared to represented groups. Associations of missing percentages with age, health literacy score, and survey completion date were evaluated. We used negative binomial regression to evaluate participant characteristics on the number of missed questions out of the total eligible questions for each participant. RESULTS: The dataset analyzed contained data for 334,183 participants who submitted at least one baseline survey. Almost all (97.0%) of the participants completed all baseline surveys, and only 541 (0.2%) participants skipped all questions in at least one of the baseline surveys. The median skip rate was 5.0% of the questions, with an interquartile range (IQR) of 2.5% to 7.9%. Historically underrepresented groups were associated with higher missingness (incidence rate ratio (IRR) [95% CI]: 1.26 [1.25, 1.27] for Black/African American compared to White). Missing percentages were similar by survey completion date, participant age, and health literacy score. Skipping specific questions were associated with higher missingness (IRRs [95% CI]: 1.39 [1.38, 1.40] for skipping income, 1.92 [1.89, 1.95] for skipping education, 2.19 [2.09–2.30] for skipping sexual and gender questions). CONCLUSION: Surveys in the All of Us Research Program will form an essential component of the data researchers can use to perform their analyses. Missingness was low in All of Us baseline surveys, but group differences exist. Additional statistical methods and careful analysis of surveys could help mitigate challenges to the validity of conclusions.
format Online
Article
Text
id pubmed-10194909
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-101949092023-05-19 Importance of missingness in baseline variables: A case study of the All of Us Research Program Cronin, Robert M. Feng, Xiaoke Sulieman, Lina Mapes, Brandy Garbett, Shawn Able, Ashley Hale, Ryan Couper, Mick P. Sansbury, Heather Ahmedani, Brian K. Chen, Qingxia PLoS One Research Article OBJECTIVE: The All of Us Research Program collects data from multiple information sources, including health surveys, to build a national longitudinal research repository that researchers can use to advance precision medicine. Missing survey responses pose challenges to study conclusions. We describe missingness in All of Us baseline surveys. STUDY DESIGN AND SETTING: We extracted survey responses between May 31, 2017, to September 30, 2020. Missing percentages for groups historically underrepresented in biomedical research were compared to represented groups. Associations of missing percentages with age, health literacy score, and survey completion date were evaluated. We used negative binomial regression to evaluate participant characteristics on the number of missed questions out of the total eligible questions for each participant. RESULTS: The dataset analyzed contained data for 334,183 participants who submitted at least one baseline survey. Almost all (97.0%) of the participants completed all baseline surveys, and only 541 (0.2%) participants skipped all questions in at least one of the baseline surveys. The median skip rate was 5.0% of the questions, with an interquartile range (IQR) of 2.5% to 7.9%. Historically underrepresented groups were associated with higher missingness (incidence rate ratio (IRR) [95% CI]: 1.26 [1.25, 1.27] for Black/African American compared to White). Missing percentages were similar by survey completion date, participant age, and health literacy score. Skipping specific questions were associated with higher missingness (IRRs [95% CI]: 1.39 [1.38, 1.40] for skipping income, 1.92 [1.89, 1.95] for skipping education, 2.19 [2.09–2.30] for skipping sexual and gender questions). CONCLUSION: Surveys in the All of Us Research Program will form an essential component of the data researchers can use to perform their analyses. Missingness was low in All of Us baseline surveys, but group differences exist. Additional statistical methods and careful analysis of surveys could help mitigate challenges to the validity of conclusions. Public Library of Science 2023-05-18 /pmc/articles/PMC10194909/ /pubmed/37200348 http://dx.doi.org/10.1371/journal.pone.0285848 Text en © 2023 Cronin et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Cronin, Robert M.
Feng, Xiaoke
Sulieman, Lina
Mapes, Brandy
Garbett, Shawn
Able, Ashley
Hale, Ryan
Couper, Mick P.
Sansbury, Heather
Ahmedani, Brian K.
Chen, Qingxia
Importance of missingness in baseline variables: A case study of the All of Us Research Program
title Importance of missingness in baseline variables: A case study of the All of Us Research Program
title_full Importance of missingness in baseline variables: A case study of the All of Us Research Program
title_fullStr Importance of missingness in baseline variables: A case study of the All of Us Research Program
title_full_unstemmed Importance of missingness in baseline variables: A case study of the All of Us Research Program
title_short Importance of missingness in baseline variables: A case study of the All of Us Research Program
title_sort importance of missingness in baseline variables: a case study of the all of us research program
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10194909/
https://www.ncbi.nlm.nih.gov/pubmed/37200348
http://dx.doi.org/10.1371/journal.pone.0285848
work_keys_str_mv AT croninrobertm importanceofmissingnessinbaselinevariablesacasestudyoftheallofusresearchprogram
AT fengxiaoke importanceofmissingnessinbaselinevariablesacasestudyoftheallofusresearchprogram
AT suliemanlina importanceofmissingnessinbaselinevariablesacasestudyoftheallofusresearchprogram
AT mapesbrandy importanceofmissingnessinbaselinevariablesacasestudyoftheallofusresearchprogram
AT garbettshawn importanceofmissingnessinbaselinevariablesacasestudyoftheallofusresearchprogram
AT ableashley importanceofmissingnessinbaselinevariablesacasestudyoftheallofusresearchprogram
AT haleryan importanceofmissingnessinbaselinevariablesacasestudyoftheallofusresearchprogram
AT coupermickp importanceofmissingnessinbaselinevariablesacasestudyoftheallofusresearchprogram
AT sansburyheather importanceofmissingnessinbaselinevariablesacasestudyoftheallofusresearchprogram
AT ahmedanibriank importanceofmissingnessinbaselinevariablesacasestudyoftheallofusresearchprogram
AT chenqingxia importanceofmissingnessinbaselinevariablesacasestudyoftheallofusresearchprogram