Cargando…

Dynamic model updating (DMU) approach for statistical learning model building with missing data

BACKGROUND: Developing statistical and machine learning methods on studies with missing information is a ubiquitous challenge in real-world biological research. The strategy in literature relies on either removing the samples with missing values like complete case analysis (CCA) or imputing the info...

Descripción completa

Detalles Bibliográficos
Autores principales: Jain, Rahi, Xu, Wei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8086098/
https://www.ncbi.nlm.nih.gov/pubmed/33926384
http://dx.doi.org/10.1186/s12859-021-04138-z
_version_ 1783686457725026304
author Jain, Rahi
Xu, Wei
author_facet Jain, Rahi
Xu, Wei
author_sort Jain, Rahi
collection PubMed
description BACKGROUND: Developing statistical and machine learning methods on studies with missing information is a ubiquitous challenge in real-world biological research. The strategy in literature relies on either removing the samples with missing values like complete case analysis (CCA) or imputing the information in the samples with missing values like predictive mean matching (PMM) such as MICE. Some limitations of these strategies are information loss and closeness of the imputed values with the missing values. Further, in scenarios with piecemeal medical data, these strategies have to wait to complete the data collection process to provide a complete dataset for statistical models. METHOD AND RESULTS: This study proposes a dynamic model updating (DMU) approach, a different strategy to develop statistical models with missing data. DMU uses only the information available in the dataset to prepare the statistical models. DMU segments the original dataset into small complete datasets. The study uses hierarchical clustering to segment the original dataset into small complete datasets followed by Bayesian regression on each of the small complete datasets. Predictor estimates are updated using the posterior estimates from each dataset. The performance of DMU is evaluated by using both simulated data and real studies and show better results or at par with other approaches like CCA and PMM. CONCLUSION: DMU approach provides an alternative to the existing approaches of information elimination and imputation in processing the datasets with missing values. While the study applied the approach for continuous cross-sectional data, the approach can be applied to longitudinal, categorical and time-to-event biological data. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04138-z.
format Online
Article
Text
id pubmed-8086098
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-80860982021-04-30 Dynamic model updating (DMU) approach for statistical learning model building with missing data Jain, Rahi Xu, Wei BMC Bioinformatics Methodology Article BACKGROUND: Developing statistical and machine learning methods on studies with missing information is a ubiquitous challenge in real-world biological research. The strategy in literature relies on either removing the samples with missing values like complete case analysis (CCA) or imputing the information in the samples with missing values like predictive mean matching (PMM) such as MICE. Some limitations of these strategies are information loss and closeness of the imputed values with the missing values. Further, in scenarios with piecemeal medical data, these strategies have to wait to complete the data collection process to provide a complete dataset for statistical models. METHOD AND RESULTS: This study proposes a dynamic model updating (DMU) approach, a different strategy to develop statistical models with missing data. DMU uses only the information available in the dataset to prepare the statistical models. DMU segments the original dataset into small complete datasets. The study uses hierarchical clustering to segment the original dataset into small complete datasets followed by Bayesian regression on each of the small complete datasets. Predictor estimates are updated using the posterior estimates from each dataset. The performance of DMU is evaluated by using both simulated data and real studies and show better results or at par with other approaches like CCA and PMM. CONCLUSION: DMU approach provides an alternative to the existing approaches of information elimination and imputation in processing the datasets with missing values. While the study applied the approach for continuous cross-sectional data, the approach can be applied to longitudinal, categorical and time-to-event biological data. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04138-z. BioMed Central 2021-04-29 /pmc/articles/PMC8086098/ /pubmed/33926384 http://dx.doi.org/10.1186/s12859-021-04138-z Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Methodology Article
Jain, Rahi
Xu, Wei
Dynamic model updating (DMU) approach for statistical learning model building with missing data
title Dynamic model updating (DMU) approach for statistical learning model building with missing data
title_full Dynamic model updating (DMU) approach for statistical learning model building with missing data
title_fullStr Dynamic model updating (DMU) approach for statistical learning model building with missing data
title_full_unstemmed Dynamic model updating (DMU) approach for statistical learning model building with missing data
title_short Dynamic model updating (DMU) approach for statistical learning model building with missing data
title_sort dynamic model updating (dmu) approach for statistical learning model building with missing data
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8086098/
https://www.ncbi.nlm.nih.gov/pubmed/33926384
http://dx.doi.org/10.1186/s12859-021-04138-z
work_keys_str_mv AT jainrahi dynamicmodelupdatingdmuapproachforstatisticallearningmodelbuildingwithmissingdata
AT xuwei dynamicmodelupdatingdmuapproachforstatisticallearningmodelbuildingwithmissingdata