Cargando…

Binned Data Provide Better Imputation of Missing Time Series Data from Wearables

The presence of missing values in a time-series dataset is a very common and well-known problem. Various statistical and machine learning methods have been developed to overcome this problem, with the aim of filling in the missing values in the data. However, the performances of these methods vary w...

Descripción completa

Detalles Bibliográficos
Autores principales: Chakrabarti, Shweta, Biswas, Nupur, Karnani, Khushi, Padul, Vijay, Jones, Lawrence D., Kesari, Santosh, Ashili, Shashaanka
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9919790/
https://www.ncbi.nlm.nih.gov/pubmed/36772494
http://dx.doi.org/10.3390/s23031454
_version_ 1784886911238668288
author Chakrabarti, Shweta
Biswas, Nupur
Karnani, Khushi
Padul, Vijay
Jones, Lawrence D.
Kesari, Santosh
Ashili, Shashaanka
author_facet Chakrabarti, Shweta
Biswas, Nupur
Karnani, Khushi
Padul, Vijay
Jones, Lawrence D.
Kesari, Santosh
Ashili, Shashaanka
author_sort Chakrabarti, Shweta
collection PubMed
description The presence of missing values in a time-series dataset is a very common and well-known problem. Various statistical and machine learning methods have been developed to overcome this problem, with the aim of filling in the missing values in the data. However, the performances of these methods vary widely, showing a high dependence on the type of data and correlations within the data. In our study, we performed some of the well-known imputation methods, such as expectation maximization, k-nearest neighbor, iterative imputer, random forest, and simple imputer, to impute missing data obtained from smart, wearable health trackers. In this manuscript, we proposed the use of data binning for imputation. We showed that the use of data binned around the missing time interval provides a better imputation than the use of a whole dataset. Imputation was performed for 15 min and 1 h of continuous missing data. We used a dataset with different bin sizes, such as 15 min, 30 min, 45 min, and 1 h, and we carried out evaluations using root mean square error (RMSE) values. We observed that the expectation maximization algorithm worked best for the use of binned data. This was followed by the simple imputer, iterative imputer, and k-nearest neighbor, whereas the random forest method had no effect on data binning during imputation. Moreover, the smallest bin sizes of 15 min and 1 h were observed to provide the lowest RMSE values for the majority of the time frames during the imputation of 15 min and 1 h of missing data, respectively. Although applicable to digital health data, we think that this method will also find applicability in other domains.
format Online
Article
Text
id pubmed-9919790
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-99197902023-02-12 Binned Data Provide Better Imputation of Missing Time Series Data from Wearables Chakrabarti, Shweta Biswas, Nupur Karnani, Khushi Padul, Vijay Jones, Lawrence D. Kesari, Santosh Ashili, Shashaanka Sensors (Basel) Article The presence of missing values in a time-series dataset is a very common and well-known problem. Various statistical and machine learning methods have been developed to overcome this problem, with the aim of filling in the missing values in the data. However, the performances of these methods vary widely, showing a high dependence on the type of data and correlations within the data. In our study, we performed some of the well-known imputation methods, such as expectation maximization, k-nearest neighbor, iterative imputer, random forest, and simple imputer, to impute missing data obtained from smart, wearable health trackers. In this manuscript, we proposed the use of data binning for imputation. We showed that the use of data binned around the missing time interval provides a better imputation than the use of a whole dataset. Imputation was performed for 15 min and 1 h of continuous missing data. We used a dataset with different bin sizes, such as 15 min, 30 min, 45 min, and 1 h, and we carried out evaluations using root mean square error (RMSE) values. We observed that the expectation maximization algorithm worked best for the use of binned data. This was followed by the simple imputer, iterative imputer, and k-nearest neighbor, whereas the random forest method had no effect on data binning during imputation. Moreover, the smallest bin sizes of 15 min and 1 h were observed to provide the lowest RMSE values for the majority of the time frames during the imputation of 15 min and 1 h of missing data, respectively. Although applicable to digital health data, we think that this method will also find applicability in other domains. MDPI 2023-01-28 /pmc/articles/PMC9919790/ /pubmed/36772494 http://dx.doi.org/10.3390/s23031454 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Chakrabarti, Shweta
Biswas, Nupur
Karnani, Khushi
Padul, Vijay
Jones, Lawrence D.
Kesari, Santosh
Ashili, Shashaanka
Binned Data Provide Better Imputation of Missing Time Series Data from Wearables
title Binned Data Provide Better Imputation of Missing Time Series Data from Wearables
title_full Binned Data Provide Better Imputation of Missing Time Series Data from Wearables
title_fullStr Binned Data Provide Better Imputation of Missing Time Series Data from Wearables
title_full_unstemmed Binned Data Provide Better Imputation of Missing Time Series Data from Wearables
title_short Binned Data Provide Better Imputation of Missing Time Series Data from Wearables
title_sort binned data provide better imputation of missing time series data from wearables
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9919790/
https://www.ncbi.nlm.nih.gov/pubmed/36772494
http://dx.doi.org/10.3390/s23031454
work_keys_str_mv AT chakrabartishweta binneddataprovidebetterimputationofmissingtimeseriesdatafromwearables
AT biswasnupur binneddataprovidebetterimputationofmissingtimeseriesdatafromwearables
AT karnanikhushi binneddataprovidebetterimputationofmissingtimeseriesdatafromwearables
AT padulvijay binneddataprovidebetterimputationofmissingtimeseriesdatafromwearables
AT joneslawrenced binneddataprovidebetterimputationofmissingtimeseriesdatafromwearables
AT kesarisantosh binneddataprovidebetterimputationofmissingtimeseriesdatafromwearables
AT ashilishashaanka binneddataprovidebetterimputationofmissingtimeseriesdatafromwearables