Cargando…
Imputation by feature importance (IBFI): A methodology to envelop machine learning method for imputing missing patterns in time series data
A new methodology, imputation by feature importance (IBFI), is studied that can be applied to any machine learning method to efficiently fill in any missing or irregularly sampled data. It applies to data missing completely at random (MCAR), missing not at random (MNAR), and missing at random (MAR)....
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8758196/ https://www.ncbi.nlm.nih.gov/pubmed/35025953 http://dx.doi.org/10.1371/journal.pone.0262131 |
_version_ | 1784632844819103744 |
---|---|
author | Mir, Adil Aslam Kearfott, Kimberlee Jane Çelebi, Fatih Vehbi Rafique, Muhammad |
author_facet | Mir, Adil Aslam Kearfott, Kimberlee Jane Çelebi, Fatih Vehbi Rafique, Muhammad |
author_sort | Mir, Adil Aslam |
collection | PubMed |
description | A new methodology, imputation by feature importance (IBFI), is studied that can be applied to any machine learning method to efficiently fill in any missing or irregularly sampled data. It applies to data missing completely at random (MCAR), missing not at random (MNAR), and missing at random (MAR). IBFI utilizes the feature importance and iteratively imputes missing values using any base learning algorithm. For this work, IBFI is tested on soil radon gas concentration (SRGC) data. XGBoost is used as the learning algorithm and missing data are simulated using R for different missingness scenarios. IBFI is based on the physically meaningful assumption that SRGC depends upon environmental parameters such as temperature and relative humidity. This assumption leads to a model obtained from the complete multivariate series where the controls are available by taking the attribute of interest as a response variable. IBFI is tested against other frequently used imputation methods, namely mean, median, mode, predictive mean matching (PMM), and hot-deck procedures. The performance of the different imputation methods was assessed using root mean squared error (RMSE), mean squared log error (MSLE), mean absolute percentage error (MAPE), percent bias (PB), and mean squared error (MSE) statistics. The imputation process requires more attention when multiple variables are missing in different samples, resulting in challenges to machine learning methods because some controls are missing. IBFI appears to have an advantage in such circumstances. For testing IBFI, Radon Time Series Data (RTS) has been used and data was collected from 1(st) March 2017 to the 11(th) of May 2018, including 4 seismic activities that have taken place during the data collection time. |
format | Online Article Text |
id | pubmed-8758196 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-87581962022-01-14 Imputation by feature importance (IBFI): A methodology to envelop machine learning method for imputing missing patterns in time series data Mir, Adil Aslam Kearfott, Kimberlee Jane Çelebi, Fatih Vehbi Rafique, Muhammad PLoS One Research Article A new methodology, imputation by feature importance (IBFI), is studied that can be applied to any machine learning method to efficiently fill in any missing or irregularly sampled data. It applies to data missing completely at random (MCAR), missing not at random (MNAR), and missing at random (MAR). IBFI utilizes the feature importance and iteratively imputes missing values using any base learning algorithm. For this work, IBFI is tested on soil radon gas concentration (SRGC) data. XGBoost is used as the learning algorithm and missing data are simulated using R for different missingness scenarios. IBFI is based on the physically meaningful assumption that SRGC depends upon environmental parameters such as temperature and relative humidity. This assumption leads to a model obtained from the complete multivariate series where the controls are available by taking the attribute of interest as a response variable. IBFI is tested against other frequently used imputation methods, namely mean, median, mode, predictive mean matching (PMM), and hot-deck procedures. The performance of the different imputation methods was assessed using root mean squared error (RMSE), mean squared log error (MSLE), mean absolute percentage error (MAPE), percent bias (PB), and mean squared error (MSE) statistics. The imputation process requires more attention when multiple variables are missing in different samples, resulting in challenges to machine learning methods because some controls are missing. IBFI appears to have an advantage in such circumstances. For testing IBFI, Radon Time Series Data (RTS) has been used and data was collected from 1(st) March 2017 to the 11(th) of May 2018, including 4 seismic activities that have taken place during the data collection time. Public Library of Science 2022-01-13 /pmc/articles/PMC8758196/ /pubmed/35025953 http://dx.doi.org/10.1371/journal.pone.0262131 Text en https://creativecommons.org/publicdomain/zero/1.0/This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 (https://creativecommons.org/publicdomain/zero/1.0/) public domain dedication. |
spellingShingle | Research Article Mir, Adil Aslam Kearfott, Kimberlee Jane Çelebi, Fatih Vehbi Rafique, Muhammad Imputation by feature importance (IBFI): A methodology to envelop machine learning method for imputing missing patterns in time series data |
title | Imputation by feature importance (IBFI): A methodology to envelop machine learning method for imputing missing patterns in time series data |
title_full | Imputation by feature importance (IBFI): A methodology to envelop machine learning method for imputing missing patterns in time series data |
title_fullStr | Imputation by feature importance (IBFI): A methodology to envelop machine learning method for imputing missing patterns in time series data |
title_full_unstemmed | Imputation by feature importance (IBFI): A methodology to envelop machine learning method for imputing missing patterns in time series data |
title_short | Imputation by feature importance (IBFI): A methodology to envelop machine learning method for imputing missing patterns in time series data |
title_sort | imputation by feature importance (ibfi): a methodology to envelop machine learning method for imputing missing patterns in time series data |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8758196/ https://www.ncbi.nlm.nih.gov/pubmed/35025953 http://dx.doi.org/10.1371/journal.pone.0262131 |
work_keys_str_mv | AT miradilaslam imputationbyfeatureimportanceibfiamethodologytoenvelopmachinelearningmethodforimputingmissingpatternsintimeseriesdata AT kearfottkimberleejane imputationbyfeatureimportanceibfiamethodologytoenvelopmachinelearningmethodforimputingmissingpatternsintimeseriesdata AT celebifatihvehbi imputationbyfeatureimportanceibfiamethodologytoenvelopmachinelearningmethodforimputingmissingpatternsintimeseriesdata AT rafiquemuhammad imputationbyfeatureimportanceibfiamethodologytoenvelopmachinelearningmethodforimputingmissingpatternsintimeseriesdata |