Cargando…
Dynamic model updating (DMU) approach for statistical learning model building with missing data
BACKGROUND: Developing statistical and machine learning methods on studies with missing information is a ubiquitous challenge in real-world biological research. The strategy in literature relies on either removing the samples with missing values like complete case analysis (CCA) or imputing the info...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8086098/ https://www.ncbi.nlm.nih.gov/pubmed/33926384 http://dx.doi.org/10.1186/s12859-021-04138-z |
_version_ | 1783686457725026304 |
---|---|
author | Jain, Rahi Xu, Wei |
author_facet | Jain, Rahi Xu, Wei |
author_sort | Jain, Rahi |
collection | PubMed |
description | BACKGROUND: Developing statistical and machine learning methods on studies with missing information is a ubiquitous challenge in real-world biological research. The strategy in literature relies on either removing the samples with missing values like complete case analysis (CCA) or imputing the information in the samples with missing values like predictive mean matching (PMM) such as MICE. Some limitations of these strategies are information loss and closeness of the imputed values with the missing values. Further, in scenarios with piecemeal medical data, these strategies have to wait to complete the data collection process to provide a complete dataset for statistical models. METHOD AND RESULTS: This study proposes a dynamic model updating (DMU) approach, a different strategy to develop statistical models with missing data. DMU uses only the information available in the dataset to prepare the statistical models. DMU segments the original dataset into small complete datasets. The study uses hierarchical clustering to segment the original dataset into small complete datasets followed by Bayesian regression on each of the small complete datasets. Predictor estimates are updated using the posterior estimates from each dataset. The performance of DMU is evaluated by using both simulated data and real studies and show better results or at par with other approaches like CCA and PMM. CONCLUSION: DMU approach provides an alternative to the existing approaches of information elimination and imputation in processing the datasets with missing values. While the study applied the approach for continuous cross-sectional data, the approach can be applied to longitudinal, categorical and time-to-event biological data. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04138-z. |
format | Online Article Text |
id | pubmed-8086098 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-80860982021-04-30 Dynamic model updating (DMU) approach for statistical learning model building with missing data Jain, Rahi Xu, Wei BMC Bioinformatics Methodology Article BACKGROUND: Developing statistical and machine learning methods on studies with missing information is a ubiquitous challenge in real-world biological research. The strategy in literature relies on either removing the samples with missing values like complete case analysis (CCA) or imputing the information in the samples with missing values like predictive mean matching (PMM) such as MICE. Some limitations of these strategies are information loss and closeness of the imputed values with the missing values. Further, in scenarios with piecemeal medical data, these strategies have to wait to complete the data collection process to provide a complete dataset for statistical models. METHOD AND RESULTS: This study proposes a dynamic model updating (DMU) approach, a different strategy to develop statistical models with missing data. DMU uses only the information available in the dataset to prepare the statistical models. DMU segments the original dataset into small complete datasets. The study uses hierarchical clustering to segment the original dataset into small complete datasets followed by Bayesian regression on each of the small complete datasets. Predictor estimates are updated using the posterior estimates from each dataset. The performance of DMU is evaluated by using both simulated data and real studies and show better results or at par with other approaches like CCA and PMM. CONCLUSION: DMU approach provides an alternative to the existing approaches of information elimination and imputation in processing the datasets with missing values. While the study applied the approach for continuous cross-sectional data, the approach can be applied to longitudinal, categorical and time-to-event biological data. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04138-z. BioMed Central 2021-04-29 /pmc/articles/PMC8086098/ /pubmed/33926384 http://dx.doi.org/10.1186/s12859-021-04138-z Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Methodology Article Jain, Rahi Xu, Wei Dynamic model updating (DMU) approach for statistical learning model building with missing data |
title | Dynamic model updating (DMU) approach for statistical learning model building with missing data |
title_full | Dynamic model updating (DMU) approach for statistical learning model building with missing data |
title_fullStr | Dynamic model updating (DMU) approach for statistical learning model building with missing data |
title_full_unstemmed | Dynamic model updating (DMU) approach for statistical learning model building with missing data |
title_short | Dynamic model updating (DMU) approach for statistical learning model building with missing data |
title_sort | dynamic model updating (dmu) approach for statistical learning model building with missing data |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8086098/ https://www.ncbi.nlm.nih.gov/pubmed/33926384 http://dx.doi.org/10.1186/s12859-021-04138-z |
work_keys_str_mv | AT jainrahi dynamicmodelupdatingdmuapproachforstatisticallearningmodelbuildingwithmissingdata AT xuwei dynamicmodelupdatingdmuapproachforstatisticallearningmodelbuildingwithmissingdata |