Cargando…

Monte Carlo cross-validation for a study with binary outcome and limited sample size

Cross-validation (CV) is a resampling approach to evaluate machine learning models when sample size is limited. The number of all possible combinations of folds for the training data, known as CV rounds, are often very small in leave-one-out CV. Alternatively, Monte Carlo cross-validation (MCCV) can...

Descripción completa

Detalles Bibliográficos
Autor principal: Shan, Guogen
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9578204/
https://www.ncbi.nlm.nih.gov/pubmed/36253749
http://dx.doi.org/10.1186/s12911-022-02016-z
_version_ 1784811922158256128
author Shan, Guogen
author_facet Shan, Guogen
author_sort Shan, Guogen
collection PubMed
description Cross-validation (CV) is a resampling approach to evaluate machine learning models when sample size is limited. The number of all possible combinations of folds for the training data, known as CV rounds, are often very small in leave-one-out CV. Alternatively, Monte Carlo cross-validation (MCCV) can be performed with a flexible number of simulations when computational resources are feasible for a study with limited sample size. We conduct extensive simulation studies to compare accuracy between MCCV and CV with the same number of simulations for a study with binary outcome (e.g., disease progression or not). Accuracy of MCCV is generally higher than CV although the gain is small. They have similar performance when sample size is large. Meanwhile, MCCV is going to provide reliable performance metrics as the number of simulations increases. Two real examples are used to illustrate the comparison between MCCV and CV.
format Online
Article
Text
id pubmed-9578204
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-95782042022-10-19 Monte Carlo cross-validation for a study with binary outcome and limited sample size Shan, Guogen BMC Med Inform Decis Mak Research Cross-validation (CV) is a resampling approach to evaluate machine learning models when sample size is limited. The number of all possible combinations of folds for the training data, known as CV rounds, are often very small in leave-one-out CV. Alternatively, Monte Carlo cross-validation (MCCV) can be performed with a flexible number of simulations when computational resources are feasible for a study with limited sample size. We conduct extensive simulation studies to compare accuracy between MCCV and CV with the same number of simulations for a study with binary outcome (e.g., disease progression or not). Accuracy of MCCV is generally higher than CV although the gain is small. They have similar performance when sample size is large. Meanwhile, MCCV is going to provide reliable performance metrics as the number of simulations increases. Two real examples are used to illustrate the comparison between MCCV and CV. BioMed Central 2022-10-17 /pmc/articles/PMC9578204/ /pubmed/36253749 http://dx.doi.org/10.1186/s12911-022-02016-z Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Shan, Guogen
Monte Carlo cross-validation for a study with binary outcome and limited sample size
title Monte Carlo cross-validation for a study with binary outcome and limited sample size
title_full Monte Carlo cross-validation for a study with binary outcome and limited sample size
title_fullStr Monte Carlo cross-validation for a study with binary outcome and limited sample size
title_full_unstemmed Monte Carlo cross-validation for a study with binary outcome and limited sample size
title_short Monte Carlo cross-validation for a study with binary outcome and limited sample size
title_sort monte carlo cross-validation for a study with binary outcome and limited sample size
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9578204/
https://www.ncbi.nlm.nih.gov/pubmed/36253749
http://dx.doi.org/10.1186/s12911-022-02016-z
work_keys_str_mv AT shanguogen montecarlocrossvalidationforastudywithbinaryoutcomeandlimitedsamplesize