Cargando…

Strategies for Imputation of High-Resolution Environmental Data in Clinical Randomized Controlled Trials

Time series data collected in clinical trials can have varying degrees of missingness, adding challenges during statistical analyses. An additional layer of complexity is introduced for missing data in randomized controlled trials (RCT), where researchers must remain blinded between intervention and...

Descripción completa

Detalles Bibliográficos
Autores principales: Kim, Yohan, Kelly, Scott, Krishnan, Deepu, Falletta, Jay, Wilmot, Kerryn
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8835538/
https://www.ncbi.nlm.nih.gov/pubmed/35162331
http://dx.doi.org/10.3390/ijerph19031307
_version_ 1784649458443616256
author Kim, Yohan
Kelly, Scott
Krishnan, Deepu
Falletta, Jay
Wilmot, Kerryn
author_facet Kim, Yohan
Kelly, Scott
Krishnan, Deepu
Falletta, Jay
Wilmot, Kerryn
author_sort Kim, Yohan
collection PubMed
description Time series data collected in clinical trials can have varying degrees of missingness, adding challenges during statistical analyses. An additional layer of complexity is introduced for missing data in randomized controlled trials (RCT), where researchers must remain blinded between intervention and control groups. Such restriction severely limits the applicability of conventional imputation methods that would utilize other participants’ data for improved performance. This paper explores and compares various methods to impute high-resolution temperature logger data in RCT settings. In addition to the conventional non-parametric approaches, we propose a spline regression (SR) approach that captures the dynamics of indoor temperature by time of day that is unique to each participant. We investigate how the inclusion of external temperature and energy use can improve the model performance. Results show that SR imputation results in 16% smaller root mean squared error (RMSE) compared to conventional imputation methods, with the gap widening to 22% when more than half of data is missing. The SR method is particularly useful in cases where missingness occurs simultaneously for multiple participants, such as concurrent battery failures. We demonstrate how proper modelling of periodic dynamics can lead to significantly improved imputation performance, even with limited data.
format Online
Article
Text
id pubmed-8835538
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-88355382022-02-12 Strategies for Imputation of High-Resolution Environmental Data in Clinical Randomized Controlled Trials Kim, Yohan Kelly, Scott Krishnan, Deepu Falletta, Jay Wilmot, Kerryn Int J Environ Res Public Health Article Time series data collected in clinical trials can have varying degrees of missingness, adding challenges during statistical analyses. An additional layer of complexity is introduced for missing data in randomized controlled trials (RCT), where researchers must remain blinded between intervention and control groups. Such restriction severely limits the applicability of conventional imputation methods that would utilize other participants’ data for improved performance. This paper explores and compares various methods to impute high-resolution temperature logger data in RCT settings. In addition to the conventional non-parametric approaches, we propose a spline regression (SR) approach that captures the dynamics of indoor temperature by time of day that is unique to each participant. We investigate how the inclusion of external temperature and energy use can improve the model performance. Results show that SR imputation results in 16% smaller root mean squared error (RMSE) compared to conventional imputation methods, with the gap widening to 22% when more than half of data is missing. The SR method is particularly useful in cases where missingness occurs simultaneously for multiple participants, such as concurrent battery failures. We demonstrate how proper modelling of periodic dynamics can lead to significantly improved imputation performance, even with limited data. MDPI 2022-01-24 /pmc/articles/PMC8835538/ /pubmed/35162331 http://dx.doi.org/10.3390/ijerph19031307 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Kim, Yohan
Kelly, Scott
Krishnan, Deepu
Falletta, Jay
Wilmot, Kerryn
Strategies for Imputation of High-Resolution Environmental Data in Clinical Randomized Controlled Trials
title Strategies for Imputation of High-Resolution Environmental Data in Clinical Randomized Controlled Trials
title_full Strategies for Imputation of High-Resolution Environmental Data in Clinical Randomized Controlled Trials
title_fullStr Strategies for Imputation of High-Resolution Environmental Data in Clinical Randomized Controlled Trials
title_full_unstemmed Strategies for Imputation of High-Resolution Environmental Data in Clinical Randomized Controlled Trials
title_short Strategies for Imputation of High-Resolution Environmental Data in Clinical Randomized Controlled Trials
title_sort strategies for imputation of high-resolution environmental data in clinical randomized controlled trials
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8835538/
https://www.ncbi.nlm.nih.gov/pubmed/35162331
http://dx.doi.org/10.3390/ijerph19031307
work_keys_str_mv AT kimyohan strategiesforimputationofhighresolutionenvironmentaldatainclinicalrandomizedcontrolledtrials
AT kellyscott strategiesforimputationofhighresolutionenvironmentaldatainclinicalrandomizedcontrolledtrials
AT krishnandeepu strategiesforimputationofhighresolutionenvironmentaldatainclinicalrandomizedcontrolledtrials
AT fallettajay strategiesforimputationofhighresolutionenvironmentaldatainclinicalrandomizedcontrolledtrials
AT wilmotkerryn strategiesforimputationofhighresolutionenvironmentaldatainclinicalrandomizedcontrolledtrials