Cargando…

Binned Data Provide Better Imputation of Missing Time Series Data from Wearables

The presence of missing values in a time-series dataset is a very common and well-known problem. Various statistical and machine learning methods have been developed to overcome this problem, with the aim of filling in the missing values in the data. However, the performances of these methods vary w...

Descripción completa

Detalles Bibliográficos
Autores principales:	Chakrabarti, Shweta, Biswas, Nupur, Karnani, Khushi, Padul, Vijay, Jones, Lawrence D., Kesari, Santosh, Ashili, Shashaanka
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9919790/ https://www.ncbi.nlm.nih.gov/pubmed/36772494 http://dx.doi.org/10.3390/s23031454

_version_	1784886911238668288
author	Chakrabarti, Shweta Biswas, Nupur Karnani, Khushi Padul, Vijay Jones, Lawrence D. Kesari, Santosh Ashili, Shashaanka
author_facet	Chakrabarti, Shweta Biswas, Nupur Karnani, Khushi Padul, Vijay Jones, Lawrence D. Kesari, Santosh Ashili, Shashaanka
author_sort	Chakrabarti, Shweta
collection	PubMed
description	The presence of missing values in a time-series dataset is a very common and well-known problem. Various statistical and machine learning methods have been developed to overcome this problem, with the aim of filling in the missing values in the data. However, the performances of these methods vary widely, showing a high dependence on the type of data and correlations within the data. In our study, we performed some of the well-known imputation methods, such as expectation maximization, k-nearest neighbor, iterative imputer, random forest, and simple imputer, to impute missing data obtained from smart, wearable health trackers. In this manuscript, we proposed the use of data binning for imputation. We showed that the use of data binned around the missing time interval provides a better imputation than the use of a whole dataset. Imputation was performed for 15 min and 1 h of continuous missing data. We used a dataset with different bin sizes, such as 15 min, 30 min, 45 min, and 1 h, and we carried out evaluations using root mean square error (RMSE) values. We observed that the expectation maximization algorithm worked best for the use of binned data. This was followed by the simple imputer, iterative imputer, and k-nearest neighbor, whereas the random forest method had no effect on data binning during imputation. Moreover, the smallest bin sizes of 15 min and 1 h were observed to provide the lowest RMSE values for the majority of the time frames during the imputation of 15 min and 1 h of missing data, respectively. Although applicable to digital health data, we think that this method will also find applicability in other domains.
format	Online Article Text
id	pubmed-9919790
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-99197902023-02-12 Binned Data Provide Better Imputation of Missing Time Series Data from Wearables Chakrabarti, Shweta Biswas, Nupur Karnani, Khushi Padul, Vijay Jones, Lawrence D. Kesari, Santosh Ashili, Shashaanka Sensors (Basel) Article The presence of missing values in a time-series dataset is a very common and well-known problem. Various statistical and machine learning methods have been developed to overcome this problem, with the aim of filling in the missing values in the data. However, the performances of these methods vary widely, showing a high dependence on the type of data and correlations within the data. In our study, we performed some of the well-known imputation methods, such as expectation maximization, k-nearest neighbor, iterative imputer, random forest, and simple imputer, to impute missing data obtained from smart, wearable health trackers. In this manuscript, we proposed the use of data binning for imputation. We showed that the use of data binned around the missing time interval provides a better imputation than the use of a whole dataset. Imputation was performed for 15 min and 1 h of continuous missing data. We used a dataset with different bin sizes, such as 15 min, 30 min, 45 min, and 1 h, and we carried out evaluations using root mean square error (RMSE) values. We observed that the expectation maximization algorithm worked best for the use of binned data. This was followed by the simple imputer, iterative imputer, and k-nearest neighbor, whereas the random forest method had no effect on data binning during imputation. Moreover, the smallest bin sizes of 15 min and 1 h were observed to provide the lowest RMSE values for the majority of the time frames during the imputation of 15 min and 1 h of missing data, respectively. Although applicable to digital health data, we think that this method will also find applicability in other domains. MDPI 2023-01-28 /pmc/articles/PMC9919790/ /pubmed/36772494 http://dx.doi.org/10.3390/s23031454 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle	Article Chakrabarti, Shweta Biswas, Nupur Karnani, Khushi Padul, Vijay Jones, Lawrence D. Kesari, Santosh Ashili, Shashaanka Binned Data Provide Better Imputation of Missing Time Series Data from Wearables
title	Binned Data Provide Better Imputation of Missing Time Series Data from Wearables
title_full	Binned Data Provide Better Imputation of Missing Time Series Data from Wearables
title_fullStr	Binned Data Provide Better Imputation of Missing Time Series Data from Wearables
title_full_unstemmed	Binned Data Provide Better Imputation of Missing Time Series Data from Wearables
title_short	Binned Data Provide Better Imputation of Missing Time Series Data from Wearables
title_sort	binned data provide better imputation of missing time series data from wearables
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9919790/ https://www.ncbi.nlm.nih.gov/pubmed/36772494 http://dx.doi.org/10.3390/s23031454
work_keys_str_mv	AT chakrabartishweta binneddataprovidebetterimputationofmissingtimeseriesdatafromwearables AT biswasnupur binneddataprovidebetterimputationofmissingtimeseriesdatafromwearables AT karnanikhushi binneddataprovidebetterimputationofmissingtimeseriesdatafromwearables AT padulvijay binneddataprovidebetterimputationofmissingtimeseriesdatafromwearables AT joneslawrenced binneddataprovidebetterimputationofmissingtimeseriesdatafromwearables AT kesarisantosh binneddataprovidebetterimputationofmissingtimeseriesdatafromwearables AT ashilishashaanka binneddataprovidebetterimputationofmissingtimeseriesdatafromwearables

Binned Data Provide Better Imputation of Missing Time Series Data from Wearables

Ejemplares similares