Cargando…
Monte Carlo cross-validation for a study with binary outcome and limited sample size
Cross-validation (CV) is a resampling approach to evaluate machine learning models when sample size is limited. The number of all possible combinations of folds for the training data, known as CV rounds, are often very small in leave-one-out CV. Alternatively, Monte Carlo cross-validation (MCCV) can...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9578204/ https://www.ncbi.nlm.nih.gov/pubmed/36253749 http://dx.doi.org/10.1186/s12911-022-02016-z |
_version_ | 1784811922158256128 |
---|---|
author | Shan, Guogen |
author_facet | Shan, Guogen |
author_sort | Shan, Guogen |
collection | PubMed |
description | Cross-validation (CV) is a resampling approach to evaluate machine learning models when sample size is limited. The number of all possible combinations of folds for the training data, known as CV rounds, are often very small in leave-one-out CV. Alternatively, Monte Carlo cross-validation (MCCV) can be performed with a flexible number of simulations when computational resources are feasible for a study with limited sample size. We conduct extensive simulation studies to compare accuracy between MCCV and CV with the same number of simulations for a study with binary outcome (e.g., disease progression or not). Accuracy of MCCV is generally higher than CV although the gain is small. They have similar performance when sample size is large. Meanwhile, MCCV is going to provide reliable performance metrics as the number of simulations increases. Two real examples are used to illustrate the comparison between MCCV and CV. |
format | Online Article Text |
id | pubmed-9578204 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-95782042022-10-19 Monte Carlo cross-validation for a study with binary outcome and limited sample size Shan, Guogen BMC Med Inform Decis Mak Research Cross-validation (CV) is a resampling approach to evaluate machine learning models when sample size is limited. The number of all possible combinations of folds for the training data, known as CV rounds, are often very small in leave-one-out CV. Alternatively, Monte Carlo cross-validation (MCCV) can be performed with a flexible number of simulations when computational resources are feasible for a study with limited sample size. We conduct extensive simulation studies to compare accuracy between MCCV and CV with the same number of simulations for a study with binary outcome (e.g., disease progression or not). Accuracy of MCCV is generally higher than CV although the gain is small. They have similar performance when sample size is large. Meanwhile, MCCV is going to provide reliable performance metrics as the number of simulations increases. Two real examples are used to illustrate the comparison between MCCV and CV. BioMed Central 2022-10-17 /pmc/articles/PMC9578204/ /pubmed/36253749 http://dx.doi.org/10.1186/s12911-022-02016-z Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Shan, Guogen Monte Carlo cross-validation for a study with binary outcome and limited sample size |
title | Monte Carlo cross-validation for a study with binary outcome and limited sample size |
title_full | Monte Carlo cross-validation for a study with binary outcome and limited sample size |
title_fullStr | Monte Carlo cross-validation for a study with binary outcome and limited sample size |
title_full_unstemmed | Monte Carlo cross-validation for a study with binary outcome and limited sample size |
title_short | Monte Carlo cross-validation for a study with binary outcome and limited sample size |
title_sort | monte carlo cross-validation for a study with binary outcome and limited sample size |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9578204/ https://www.ncbi.nlm.nih.gov/pubmed/36253749 http://dx.doi.org/10.1186/s12911-022-02016-z |
work_keys_str_mv | AT shanguogen montecarlocrossvalidationforastudywithbinaryoutcomeandlimitedsamplesize |