Cargando…
Binned Data Provide Better Imputation of Missing Time Series Data from Wearables
The presence of missing values in a time-series dataset is a very common and well-known problem. Various statistical and machine learning methods have been developed to overcome this problem, with the aim of filling in the missing values in the data. However, the performances of these methods vary w...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9919790/ https://www.ncbi.nlm.nih.gov/pubmed/36772494 http://dx.doi.org/10.3390/s23031454 |
_version_ | 1784886911238668288 |
---|---|
author | Chakrabarti, Shweta Biswas, Nupur Karnani, Khushi Padul, Vijay Jones, Lawrence D. Kesari, Santosh Ashili, Shashaanka |
author_facet | Chakrabarti, Shweta Biswas, Nupur Karnani, Khushi Padul, Vijay Jones, Lawrence D. Kesari, Santosh Ashili, Shashaanka |
author_sort | Chakrabarti, Shweta |
collection | PubMed |
description | The presence of missing values in a time-series dataset is a very common and well-known problem. Various statistical and machine learning methods have been developed to overcome this problem, with the aim of filling in the missing values in the data. However, the performances of these methods vary widely, showing a high dependence on the type of data and correlations within the data. In our study, we performed some of the well-known imputation methods, such as expectation maximization, k-nearest neighbor, iterative imputer, random forest, and simple imputer, to impute missing data obtained from smart, wearable health trackers. In this manuscript, we proposed the use of data binning for imputation. We showed that the use of data binned around the missing time interval provides a better imputation than the use of a whole dataset. Imputation was performed for 15 min and 1 h of continuous missing data. We used a dataset with different bin sizes, such as 15 min, 30 min, 45 min, and 1 h, and we carried out evaluations using root mean square error (RMSE) values. We observed that the expectation maximization algorithm worked best for the use of binned data. This was followed by the simple imputer, iterative imputer, and k-nearest neighbor, whereas the random forest method had no effect on data binning during imputation. Moreover, the smallest bin sizes of 15 min and 1 h were observed to provide the lowest RMSE values for the majority of the time frames during the imputation of 15 min and 1 h of missing data, respectively. Although applicable to digital health data, we think that this method will also find applicability in other domains. |
format | Online Article Text |
id | pubmed-9919790 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-99197902023-02-12 Binned Data Provide Better Imputation of Missing Time Series Data from Wearables Chakrabarti, Shweta Biswas, Nupur Karnani, Khushi Padul, Vijay Jones, Lawrence D. Kesari, Santosh Ashili, Shashaanka Sensors (Basel) Article The presence of missing values in a time-series dataset is a very common and well-known problem. Various statistical and machine learning methods have been developed to overcome this problem, with the aim of filling in the missing values in the data. However, the performances of these methods vary widely, showing a high dependence on the type of data and correlations within the data. In our study, we performed some of the well-known imputation methods, such as expectation maximization, k-nearest neighbor, iterative imputer, random forest, and simple imputer, to impute missing data obtained from smart, wearable health trackers. In this manuscript, we proposed the use of data binning for imputation. We showed that the use of data binned around the missing time interval provides a better imputation than the use of a whole dataset. Imputation was performed for 15 min and 1 h of continuous missing data. We used a dataset with different bin sizes, such as 15 min, 30 min, 45 min, and 1 h, and we carried out evaluations using root mean square error (RMSE) values. We observed that the expectation maximization algorithm worked best for the use of binned data. This was followed by the simple imputer, iterative imputer, and k-nearest neighbor, whereas the random forest method had no effect on data binning during imputation. Moreover, the smallest bin sizes of 15 min and 1 h were observed to provide the lowest RMSE values for the majority of the time frames during the imputation of 15 min and 1 h of missing data, respectively. Although applicable to digital health data, we think that this method will also find applicability in other domains. MDPI 2023-01-28 /pmc/articles/PMC9919790/ /pubmed/36772494 http://dx.doi.org/10.3390/s23031454 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Chakrabarti, Shweta Biswas, Nupur Karnani, Khushi Padul, Vijay Jones, Lawrence D. Kesari, Santosh Ashili, Shashaanka Binned Data Provide Better Imputation of Missing Time Series Data from Wearables |
title | Binned Data Provide Better Imputation of Missing Time Series Data from Wearables |
title_full | Binned Data Provide Better Imputation of Missing Time Series Data from Wearables |
title_fullStr | Binned Data Provide Better Imputation of Missing Time Series Data from Wearables |
title_full_unstemmed | Binned Data Provide Better Imputation of Missing Time Series Data from Wearables |
title_short | Binned Data Provide Better Imputation of Missing Time Series Data from Wearables |
title_sort | binned data provide better imputation of missing time series data from wearables |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9919790/ https://www.ncbi.nlm.nih.gov/pubmed/36772494 http://dx.doi.org/10.3390/s23031454 |
work_keys_str_mv | AT chakrabartishweta binneddataprovidebetterimputationofmissingtimeseriesdatafromwearables AT biswasnupur binneddataprovidebetterimputationofmissingtimeseriesdatafromwearables AT karnanikhushi binneddataprovidebetterimputationofmissingtimeseriesdatafromwearables AT padulvijay binneddataprovidebetterimputationofmissingtimeseriesdatafromwearables AT joneslawrenced binneddataprovidebetterimputationofmissingtimeseriesdatafromwearables AT kesarisantosh binneddataprovidebetterimputationofmissingtimeseriesdatafromwearables AT ashilishashaanka binneddataprovidebetterimputationofmissingtimeseriesdatafromwearables |