Cargando…

Effects of influential points and sample size on the selection and replicability of multivariable fractional polynomial models

BACKGROUND: The multivariable fractional polynomial (MFP) approach combines variable selection using backward elimination with a function selection procedure (FSP) for fractional polynomial (FP) functions. It is a relatively simple approach which can be easily understood without advanced training in...

Descripción completa

Detalles Bibliográficos
Autores principales: Sauerbrei, Willi, Kipruto, Edwin, Balmford, James
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10111698/
https://www.ncbi.nlm.nih.gov/pubmed/37069621
http://dx.doi.org/10.1186/s41512-023-00145-1
_version_ 1785027500845301760
author Sauerbrei, Willi
Kipruto, Edwin
Balmford, James
author_facet Sauerbrei, Willi
Kipruto, Edwin
Balmford, James
author_sort Sauerbrei, Willi
collection PubMed
description BACKGROUND: The multivariable fractional polynomial (MFP) approach combines variable selection using backward elimination with a function selection procedure (FSP) for fractional polynomial (FP) functions. It is a relatively simple approach which can be easily understood without advanced training in statistical modeling. For continuous variables, a closed test procedure is used to decide between no effect, linear, FP1, or FP2 functions. Influential points (IPs) and small sample sizes can both have a strong impact on a selected function and MFP model. METHODS: We used simulated data with six continuous and four categorical predictors to illustrate approaches which can help to identify IPs with an influence on function selection and the MFP model. Approaches use leave-one or two-out and two related techniques for a multivariable assessment. In eight subsamples, we also investigated the effects of sample size and model replicability, the latter by using three non-overlapping subsamples with the same sample size. For better illustration, a structured profile was used to provide an overview of all analyses conducted. RESULTS: The results showed that one or more IPs can drive the functions and models selected. In addition, with a small sample size, MFP was not able to detect some non-linear functions and the selected model differed substantially from the true underlying model. However, when the sample size was relatively large and regression diagnostics were carefully conducted, MFP selected functions or models that were similar to the underlying true model. CONCLUSIONS: For smaller sample size, IPs and low power are important reasons that the MFP approach may not be able to identify underlying functional relationships for continuous variables and selected models might differ substantially from the true model. However, for larger sample sizes, a carefully conducted MFP analysis is often a suitable way to select a multivariable regression model which includes continuous variables. In such a case, MFP can be the preferred approach to derive a multivariable descriptive model. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s41512-023-00145-1.
format Online
Article
Text
id pubmed-10111698
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-101116982023-04-19 Effects of influential points and sample size on the selection and replicability of multivariable fractional polynomial models Sauerbrei, Willi Kipruto, Edwin Balmford, James Diagn Progn Res Methodology BACKGROUND: The multivariable fractional polynomial (MFP) approach combines variable selection using backward elimination with a function selection procedure (FSP) for fractional polynomial (FP) functions. It is a relatively simple approach which can be easily understood without advanced training in statistical modeling. For continuous variables, a closed test procedure is used to decide between no effect, linear, FP1, or FP2 functions. Influential points (IPs) and small sample sizes can both have a strong impact on a selected function and MFP model. METHODS: We used simulated data with six continuous and four categorical predictors to illustrate approaches which can help to identify IPs with an influence on function selection and the MFP model. Approaches use leave-one or two-out and two related techniques for a multivariable assessment. In eight subsamples, we also investigated the effects of sample size and model replicability, the latter by using three non-overlapping subsamples with the same sample size. For better illustration, a structured profile was used to provide an overview of all analyses conducted. RESULTS: The results showed that one or more IPs can drive the functions and models selected. In addition, with a small sample size, MFP was not able to detect some non-linear functions and the selected model differed substantially from the true underlying model. However, when the sample size was relatively large and regression diagnostics were carefully conducted, MFP selected functions or models that were similar to the underlying true model. CONCLUSIONS: For smaller sample size, IPs and low power are important reasons that the MFP approach may not be able to identify underlying functional relationships for continuous variables and selected models might differ substantially from the true model. However, for larger sample sizes, a carefully conducted MFP analysis is often a suitable way to select a multivariable regression model which includes continuous variables. In such a case, MFP can be the preferred approach to derive a multivariable descriptive model. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s41512-023-00145-1. BioMed Central 2023-04-18 /pmc/articles/PMC10111698/ /pubmed/37069621 http://dx.doi.org/10.1186/s41512-023-00145-1 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Methodology
Sauerbrei, Willi
Kipruto, Edwin
Balmford, James
Effects of influential points and sample size on the selection and replicability of multivariable fractional polynomial models
title Effects of influential points and sample size on the selection and replicability of multivariable fractional polynomial models
title_full Effects of influential points and sample size on the selection and replicability of multivariable fractional polynomial models
title_fullStr Effects of influential points and sample size on the selection and replicability of multivariable fractional polynomial models
title_full_unstemmed Effects of influential points and sample size on the selection and replicability of multivariable fractional polynomial models
title_short Effects of influential points and sample size on the selection and replicability of multivariable fractional polynomial models
title_sort effects of influential points and sample size on the selection and replicability of multivariable fractional polynomial models
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10111698/
https://www.ncbi.nlm.nih.gov/pubmed/37069621
http://dx.doi.org/10.1186/s41512-023-00145-1
work_keys_str_mv AT sauerbreiwilli effectsofinfluentialpointsandsamplesizeontheselectionandreplicabilityofmultivariablefractionalpolynomialmodels
AT kiprutoedwin effectsofinfluentialpointsandsamplesizeontheselectionandreplicabilityofmultivariablefractionalpolynomialmodels
AT balmfordjames effectsofinfluentialpointsandsamplesizeontheselectionandreplicabilityofmultivariablefractionalpolynomialmodels