Cargando…

Imputation by feature importance (IBFI): A methodology to envelop machine learning method for imputing missing patterns in time series data

A new methodology, imputation by feature importance (IBFI), is studied that can be applied to any machine learning method to efficiently fill in any missing or irregularly sampled data. It applies to data missing completely at random (MCAR), missing not at random (MNAR), and missing at random (MAR)....

Descripción completa

Detalles Bibliográficos
Autores principales: Mir, Adil Aslam, Kearfott, Kimberlee Jane, Çelebi, Fatih Vehbi, Rafique, Muhammad
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8758196/
https://www.ncbi.nlm.nih.gov/pubmed/35025953
http://dx.doi.org/10.1371/journal.pone.0262131
_version_ 1784632844819103744
author Mir, Adil Aslam
Kearfott, Kimberlee Jane
Çelebi, Fatih Vehbi
Rafique, Muhammad
author_facet Mir, Adil Aslam
Kearfott, Kimberlee Jane
Çelebi, Fatih Vehbi
Rafique, Muhammad
author_sort Mir, Adil Aslam
collection PubMed
description A new methodology, imputation by feature importance (IBFI), is studied that can be applied to any machine learning method to efficiently fill in any missing or irregularly sampled data. It applies to data missing completely at random (MCAR), missing not at random (MNAR), and missing at random (MAR). IBFI utilizes the feature importance and iteratively imputes missing values using any base learning algorithm. For this work, IBFI is tested on soil radon gas concentration (SRGC) data. XGBoost is used as the learning algorithm and missing data are simulated using R for different missingness scenarios. IBFI is based on the physically meaningful assumption that SRGC depends upon environmental parameters such as temperature and relative humidity. This assumption leads to a model obtained from the complete multivariate series where the controls are available by taking the attribute of interest as a response variable. IBFI is tested against other frequently used imputation methods, namely mean, median, mode, predictive mean matching (PMM), and hot-deck procedures. The performance of the different imputation methods was assessed using root mean squared error (RMSE), mean squared log error (MSLE), mean absolute percentage error (MAPE), percent bias (PB), and mean squared error (MSE) statistics. The imputation process requires more attention when multiple variables are missing in different samples, resulting in challenges to machine learning methods because some controls are missing. IBFI appears to have an advantage in such circumstances. For testing IBFI, Radon Time Series Data (RTS) has been used and data was collected from 1(st) March 2017 to the 11(th) of May 2018, including 4 seismic activities that have taken place during the data collection time.
format Online
Article
Text
id pubmed-8758196
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-87581962022-01-14 Imputation by feature importance (IBFI): A methodology to envelop machine learning method for imputing missing patterns in time series data Mir, Adil Aslam Kearfott, Kimberlee Jane Çelebi, Fatih Vehbi Rafique, Muhammad PLoS One Research Article A new methodology, imputation by feature importance (IBFI), is studied that can be applied to any machine learning method to efficiently fill in any missing or irregularly sampled data. It applies to data missing completely at random (MCAR), missing not at random (MNAR), and missing at random (MAR). IBFI utilizes the feature importance and iteratively imputes missing values using any base learning algorithm. For this work, IBFI is tested on soil radon gas concentration (SRGC) data. XGBoost is used as the learning algorithm and missing data are simulated using R for different missingness scenarios. IBFI is based on the physically meaningful assumption that SRGC depends upon environmental parameters such as temperature and relative humidity. This assumption leads to a model obtained from the complete multivariate series where the controls are available by taking the attribute of interest as a response variable. IBFI is tested against other frequently used imputation methods, namely mean, median, mode, predictive mean matching (PMM), and hot-deck procedures. The performance of the different imputation methods was assessed using root mean squared error (RMSE), mean squared log error (MSLE), mean absolute percentage error (MAPE), percent bias (PB), and mean squared error (MSE) statistics. The imputation process requires more attention when multiple variables are missing in different samples, resulting in challenges to machine learning methods because some controls are missing. IBFI appears to have an advantage in such circumstances. For testing IBFI, Radon Time Series Data (RTS) has been used and data was collected from 1(st) March 2017 to the 11(th) of May 2018, including 4 seismic activities that have taken place during the data collection time. Public Library of Science 2022-01-13 /pmc/articles/PMC8758196/ /pubmed/35025953 http://dx.doi.org/10.1371/journal.pone.0262131 Text en https://creativecommons.org/publicdomain/zero/1.0/This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 (https://creativecommons.org/publicdomain/zero/1.0/) public domain dedication.
spellingShingle Research Article
Mir, Adil Aslam
Kearfott, Kimberlee Jane
Çelebi, Fatih Vehbi
Rafique, Muhammad
Imputation by feature importance (IBFI): A methodology to envelop machine learning method for imputing missing patterns in time series data
title Imputation by feature importance (IBFI): A methodology to envelop machine learning method for imputing missing patterns in time series data
title_full Imputation by feature importance (IBFI): A methodology to envelop machine learning method for imputing missing patterns in time series data
title_fullStr Imputation by feature importance (IBFI): A methodology to envelop machine learning method for imputing missing patterns in time series data
title_full_unstemmed Imputation by feature importance (IBFI): A methodology to envelop machine learning method for imputing missing patterns in time series data
title_short Imputation by feature importance (IBFI): A methodology to envelop machine learning method for imputing missing patterns in time series data
title_sort imputation by feature importance (ibfi): a methodology to envelop machine learning method for imputing missing patterns in time series data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8758196/
https://www.ncbi.nlm.nih.gov/pubmed/35025953
http://dx.doi.org/10.1371/journal.pone.0262131
work_keys_str_mv AT miradilaslam imputationbyfeatureimportanceibfiamethodologytoenvelopmachinelearningmethodforimputingmissingpatternsintimeseriesdata
AT kearfottkimberleejane imputationbyfeatureimportanceibfiamethodologytoenvelopmachinelearningmethodforimputingmissingpatternsintimeseriesdata
AT celebifatihvehbi imputationbyfeatureimportanceibfiamethodologytoenvelopmachinelearningmethodforimputingmissingpatternsintimeseriesdata
AT rafiquemuhammad imputationbyfeatureimportanceibfiamethodologytoenvelopmachinelearningmethodforimputingmissingpatternsintimeseriesdata