Cargando…

Dynamic model updating (DMU) approach for statistical learning model building with missing data

BACKGROUND: Developing statistical and machine learning methods on studies with missing information is a ubiquitous challenge in real-world biological research. The strategy in literature relies on either removing the samples with missing values like complete case analysis (CCA) or imputing the info...

Descripción completa

Detalles Bibliográficos
Autores principales:	Jain, Rahi, Xu, Wei
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2021
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8086098/ https://www.ncbi.nlm.nih.gov/pubmed/33926384 http://dx.doi.org/10.1186/s12859-021-04138-z

_version_	1783686457725026304
author	Jain, Rahi Xu, Wei
author_facet	Jain, Rahi Xu, Wei
author_sort	Jain, Rahi
collection	PubMed
description	BACKGROUND: Developing statistical and machine learning methods on studies with missing information is a ubiquitous challenge in real-world biological research. The strategy in literature relies on either removing the samples with missing values like complete case analysis (CCA) or imputing the information in the samples with missing values like predictive mean matching (PMM) such as MICE. Some limitations of these strategies are information loss and closeness of the imputed values with the missing values. Further, in scenarios with piecemeal medical data, these strategies have to wait to complete the data collection process to provide a complete dataset for statistical models. METHOD AND RESULTS: This study proposes a dynamic model updating (DMU) approach, a different strategy to develop statistical models with missing data. DMU uses only the information available in the dataset to prepare the statistical models. DMU segments the original dataset into small complete datasets. The study uses hierarchical clustering to segment the original dataset into small complete datasets followed by Bayesian regression on each of the small complete datasets. Predictor estimates are updated using the posterior estimates from each dataset. The performance of DMU is evaluated by using both simulated data and real studies and show better results or at par with other approaches like CCA and PMM. CONCLUSION: DMU approach provides an alternative to the existing approaches of information elimination and imputation in processing the datasets with missing values. While the study applied the approach for continuous cross-sectional data, the approach can be applied to longitudinal, categorical and time-to-event biological data. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04138-z.
format	Online Article Text
id	pubmed-8086098
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-80860982021-04-30 Dynamic model updating (DMU) approach for statistical learning model building with missing data Jain, Rahi Xu, Wei BMC Bioinformatics Methodology Article BACKGROUND: Developing statistical and machine learning methods on studies with missing information is a ubiquitous challenge in real-world biological research. The strategy in literature relies on either removing the samples with missing values like complete case analysis (CCA) or imputing the information in the samples with missing values like predictive mean matching (PMM) such as MICE. Some limitations of these strategies are information loss and closeness of the imputed values with the missing values. Further, in scenarios with piecemeal medical data, these strategies have to wait to complete the data collection process to provide a complete dataset for statistical models. METHOD AND RESULTS: This study proposes a dynamic model updating (DMU) approach, a different strategy to develop statistical models with missing data. DMU uses only the information available in the dataset to prepare the statistical models. DMU segments the original dataset into small complete datasets. The study uses hierarchical clustering to segment the original dataset into small complete datasets followed by Bayesian regression on each of the small complete datasets. Predictor estimates are updated using the posterior estimates from each dataset. The performance of DMU is evaluated by using both simulated data and real studies and show better results or at par with other approaches like CCA and PMM. CONCLUSION: DMU approach provides an alternative to the existing approaches of information elimination and imputation in processing the datasets with missing values. While the study applied the approach for continuous cross-sectional data, the approach can be applied to longitudinal, categorical and time-to-event biological data. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04138-z. BioMed Central 2021-04-29 /pmc/articles/PMC8086098/ /pubmed/33926384 http://dx.doi.org/10.1186/s12859-021-04138-z Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle	Methodology Article Jain, Rahi Xu, Wei Dynamic model updating (DMU) approach for statistical learning model building with missing data
title	Dynamic model updating (DMU) approach for statistical learning model building with missing data
title_full	Dynamic model updating (DMU) approach for statistical learning model building with missing data
title_fullStr	Dynamic model updating (DMU) approach for statistical learning model building with missing data
title_full_unstemmed	Dynamic model updating (DMU) approach for statistical learning model building with missing data
title_short	Dynamic model updating (DMU) approach for statistical learning model building with missing data
title_sort	dynamic model updating (dmu) approach for statistical learning model building with missing data
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8086098/ https://www.ncbi.nlm.nih.gov/pubmed/33926384 http://dx.doi.org/10.1186/s12859-021-04138-z
work_keys_str_mv	AT jainrahi dynamicmodelupdatingdmuapproachforstatisticallearningmodelbuildingwithmissingdata AT xuwei dynamicmodelupdatingdmuapproachforstatisticallearningmodelbuildingwithmissingdata

Dynamic model updating (DMU) approach for statistical learning model building with missing data

Ejemplares similares