Cargando…
Importance of missingness in baseline variables: A case study of the All of Us Research Program
OBJECTIVE: The All of Us Research Program collects data from multiple information sources, including health surveys, to build a national longitudinal research repository that researchers can use to advance precision medicine. Missing survey responses pose challenges to study conclusions. We describe...
Autores principales: | , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10194909/ https://www.ncbi.nlm.nih.gov/pubmed/37200348 http://dx.doi.org/10.1371/journal.pone.0285848 |
_version_ | 1785044115916849152 |
---|---|
author | Cronin, Robert M. Feng, Xiaoke Sulieman, Lina Mapes, Brandy Garbett, Shawn Able, Ashley Hale, Ryan Couper, Mick P. Sansbury, Heather Ahmedani, Brian K. Chen, Qingxia |
author_facet | Cronin, Robert M. Feng, Xiaoke Sulieman, Lina Mapes, Brandy Garbett, Shawn Able, Ashley Hale, Ryan Couper, Mick P. Sansbury, Heather Ahmedani, Brian K. Chen, Qingxia |
author_sort | Cronin, Robert M. |
collection | PubMed |
description | OBJECTIVE: The All of Us Research Program collects data from multiple information sources, including health surveys, to build a national longitudinal research repository that researchers can use to advance precision medicine. Missing survey responses pose challenges to study conclusions. We describe missingness in All of Us baseline surveys. STUDY DESIGN AND SETTING: We extracted survey responses between May 31, 2017, to September 30, 2020. Missing percentages for groups historically underrepresented in biomedical research were compared to represented groups. Associations of missing percentages with age, health literacy score, and survey completion date were evaluated. We used negative binomial regression to evaluate participant characteristics on the number of missed questions out of the total eligible questions for each participant. RESULTS: The dataset analyzed contained data for 334,183 participants who submitted at least one baseline survey. Almost all (97.0%) of the participants completed all baseline surveys, and only 541 (0.2%) participants skipped all questions in at least one of the baseline surveys. The median skip rate was 5.0% of the questions, with an interquartile range (IQR) of 2.5% to 7.9%. Historically underrepresented groups were associated with higher missingness (incidence rate ratio (IRR) [95% CI]: 1.26 [1.25, 1.27] for Black/African American compared to White). Missing percentages were similar by survey completion date, participant age, and health literacy score. Skipping specific questions were associated with higher missingness (IRRs [95% CI]: 1.39 [1.38, 1.40] for skipping income, 1.92 [1.89, 1.95] for skipping education, 2.19 [2.09–2.30] for skipping sexual and gender questions). CONCLUSION: Surveys in the All of Us Research Program will form an essential component of the data researchers can use to perform their analyses. Missingness was low in All of Us baseline surveys, but group differences exist. Additional statistical methods and careful analysis of surveys could help mitigate challenges to the validity of conclusions. |
format | Online Article Text |
id | pubmed-10194909 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-101949092023-05-19 Importance of missingness in baseline variables: A case study of the All of Us Research Program Cronin, Robert M. Feng, Xiaoke Sulieman, Lina Mapes, Brandy Garbett, Shawn Able, Ashley Hale, Ryan Couper, Mick P. Sansbury, Heather Ahmedani, Brian K. Chen, Qingxia PLoS One Research Article OBJECTIVE: The All of Us Research Program collects data from multiple information sources, including health surveys, to build a national longitudinal research repository that researchers can use to advance precision medicine. Missing survey responses pose challenges to study conclusions. We describe missingness in All of Us baseline surveys. STUDY DESIGN AND SETTING: We extracted survey responses between May 31, 2017, to September 30, 2020. Missing percentages for groups historically underrepresented in biomedical research were compared to represented groups. Associations of missing percentages with age, health literacy score, and survey completion date were evaluated. We used negative binomial regression to evaluate participant characteristics on the number of missed questions out of the total eligible questions for each participant. RESULTS: The dataset analyzed contained data for 334,183 participants who submitted at least one baseline survey. Almost all (97.0%) of the participants completed all baseline surveys, and only 541 (0.2%) participants skipped all questions in at least one of the baseline surveys. The median skip rate was 5.0% of the questions, with an interquartile range (IQR) of 2.5% to 7.9%. Historically underrepresented groups were associated with higher missingness (incidence rate ratio (IRR) [95% CI]: 1.26 [1.25, 1.27] for Black/African American compared to White). Missing percentages were similar by survey completion date, participant age, and health literacy score. Skipping specific questions were associated with higher missingness (IRRs [95% CI]: 1.39 [1.38, 1.40] for skipping income, 1.92 [1.89, 1.95] for skipping education, 2.19 [2.09–2.30] for skipping sexual and gender questions). CONCLUSION: Surveys in the All of Us Research Program will form an essential component of the data researchers can use to perform their analyses. Missingness was low in All of Us baseline surveys, but group differences exist. Additional statistical methods and careful analysis of surveys could help mitigate challenges to the validity of conclusions. Public Library of Science 2023-05-18 /pmc/articles/PMC10194909/ /pubmed/37200348 http://dx.doi.org/10.1371/journal.pone.0285848 Text en © 2023 Cronin et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Cronin, Robert M. Feng, Xiaoke Sulieman, Lina Mapes, Brandy Garbett, Shawn Able, Ashley Hale, Ryan Couper, Mick P. Sansbury, Heather Ahmedani, Brian K. Chen, Qingxia Importance of missingness in baseline variables: A case study of the All of Us Research Program |
title | Importance of missingness in baseline variables: A case study of the All of Us Research Program |
title_full | Importance of missingness in baseline variables: A case study of the All of Us Research Program |
title_fullStr | Importance of missingness in baseline variables: A case study of the All of Us Research Program |
title_full_unstemmed | Importance of missingness in baseline variables: A case study of the All of Us Research Program |
title_short | Importance of missingness in baseline variables: A case study of the All of Us Research Program |
title_sort | importance of missingness in baseline variables: a case study of the all of us research program |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10194909/ https://www.ncbi.nlm.nih.gov/pubmed/37200348 http://dx.doi.org/10.1371/journal.pone.0285848 |
work_keys_str_mv | AT croninrobertm importanceofmissingnessinbaselinevariablesacasestudyoftheallofusresearchprogram AT fengxiaoke importanceofmissingnessinbaselinevariablesacasestudyoftheallofusresearchprogram AT suliemanlina importanceofmissingnessinbaselinevariablesacasestudyoftheallofusresearchprogram AT mapesbrandy importanceofmissingnessinbaselinevariablesacasestudyoftheallofusresearchprogram AT garbettshawn importanceofmissingnessinbaselinevariablesacasestudyoftheallofusresearchprogram AT ableashley importanceofmissingnessinbaselinevariablesacasestudyoftheallofusresearchprogram AT haleryan importanceofmissingnessinbaselinevariablesacasestudyoftheallofusresearchprogram AT coupermickp importanceofmissingnessinbaselinevariablesacasestudyoftheallofusresearchprogram AT sansburyheather importanceofmissingnessinbaselinevariablesacasestudyoftheallofusresearchprogram AT ahmedanibriank importanceofmissingnessinbaselinevariablesacasestudyoftheallofusresearchprogram AT chenqingxia importanceofmissingnessinbaselinevariablesacasestudyoftheallofusresearchprogram |