Cargando…

Impact of the Choice of Cross-Validation Techniques on the Results of Machine Learning-Based Diagnostic Applications

OBJECTIVES: With advances in data availability and computing capabilities, artificial intelligence and machine learning technologies have evolved rapidly in recent years. Researchers have taken advantage of these developments in healthcare informatics and created reliable tools to predict or classif...

Descripción completa

Detalles Bibliográficos
Autores principales:	Tougui, Ilias, Jilbab, Abdelilah, El Mhamdi, Jamal
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Korean Society of Medical Informatics 2021
Materias:	Original Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8369053/ https://www.ncbi.nlm.nih.gov/pubmed/34384201 http://dx.doi.org/10.4258/hir.2021.27.3.189

_version_	1783739208794374144
author	Tougui, Ilias Jilbab, Abdelilah El Mhamdi, Jamal
author_facet	Tougui, Ilias Jilbab, Abdelilah El Mhamdi, Jamal
author_sort	Tougui, Ilias
collection	PubMed
description	OBJECTIVES: With advances in data availability and computing capabilities, artificial intelligence and machine learning technologies have evolved rapidly in recent years. Researchers have taken advantage of these developments in healthcare informatics and created reliable tools to predict or classify diseases using machine learning-based algorithms. To correctly quantify the performance of those algorithms, the standard approach is to use cross-validation, where the algorithm is trained on a training set, and its performance is measured on a validation set. Both datasets should be subject-independent to simulate the expected behavior of a clinical study. This study compares two cross-validation strategies, the subject-wise and the record-wise techniques; the subject-wise strategy correctly mimics the process of a clinical study, while the record-wise strategy does not. METHODS: We started by creating a dataset of smartphone audio recordings of subjects diagnosed with and without Parkinson’s disease. This dataset was then divided into training and holdout sets using subject-wise and the record-wise divisions. The training set was used to measure the performance of two classifiers (support vector machine and random forest) to compare six cross-validation techniques that simulated either the subject-wise process or the record-wise process. The holdout set was used to calculate the true error of the classifiers. RESULTS: The record-wise division and the record-wise cross-validation techniques overestimated the performance of the classifiers and underestimated the classification error. CONCLUSIONS: In a diagnostic scenario, the subject-wise technique is the proper way of estimating a model’s performance, and record-wise techniques should be avoided.
format	Online Article Text
id	pubmed-8369053
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	Korean Society of Medical Informatics
record_format	MEDLINE/PubMed
spelling	pubmed-83690532021-08-26 Impact of the Choice of Cross-Validation Techniques on the Results of Machine Learning-Based Diagnostic Applications Tougui, Ilias Jilbab, Abdelilah El Mhamdi, Jamal Healthc Inform Res Original Article OBJECTIVES: With advances in data availability and computing capabilities, artificial intelligence and machine learning technologies have evolved rapidly in recent years. Researchers have taken advantage of these developments in healthcare informatics and created reliable tools to predict or classify diseases using machine learning-based algorithms. To correctly quantify the performance of those algorithms, the standard approach is to use cross-validation, where the algorithm is trained on a training set, and its performance is measured on a validation set. Both datasets should be subject-independent to simulate the expected behavior of a clinical study. This study compares two cross-validation strategies, the subject-wise and the record-wise techniques; the subject-wise strategy correctly mimics the process of a clinical study, while the record-wise strategy does not. METHODS: We started by creating a dataset of smartphone audio recordings of subjects diagnosed with and without Parkinson’s disease. This dataset was then divided into training and holdout sets using subject-wise and the record-wise divisions. The training set was used to measure the performance of two classifiers (support vector machine and random forest) to compare six cross-validation techniques that simulated either the subject-wise process or the record-wise process. The holdout set was used to calculate the true error of the classifiers. RESULTS: The record-wise division and the record-wise cross-validation techniques overestimated the performance of the classifiers and underestimated the classification error. CONCLUSIONS: In a diagnostic scenario, the subject-wise technique is the proper way of estimating a model’s performance, and record-wise techniques should be avoided. Korean Society of Medical Informatics 2021-07 2021-07-31 /pmc/articles/PMC8369053/ /pubmed/34384201 http://dx.doi.org/10.4258/hir.2021.27.3.189 Text en © 2021 The Korean Society of Medical Informatics https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) ) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Article Tougui, Ilias Jilbab, Abdelilah El Mhamdi, Jamal Impact of the Choice of Cross-Validation Techniques on the Results of Machine Learning-Based Diagnostic Applications
title	Impact of the Choice of Cross-Validation Techniques on the Results of Machine Learning-Based Diagnostic Applications
title_full	Impact of the Choice of Cross-Validation Techniques on the Results of Machine Learning-Based Diagnostic Applications
title_fullStr	Impact of the Choice of Cross-Validation Techniques on the Results of Machine Learning-Based Diagnostic Applications
title_full_unstemmed	Impact of the Choice of Cross-Validation Techniques on the Results of Machine Learning-Based Diagnostic Applications
title_short	Impact of the Choice of Cross-Validation Techniques on the Results of Machine Learning-Based Diagnostic Applications
title_sort	impact of the choice of cross-validation techniques on the results of machine learning-based diagnostic applications
topic	Original Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8369053/ https://www.ncbi.nlm.nih.gov/pubmed/34384201 http://dx.doi.org/10.4258/hir.2021.27.3.189
work_keys_str_mv	AT touguiilias impactofthechoiceofcrossvalidationtechniquesontheresultsofmachinelearningbaseddiagnosticapplications AT jilbababdelilah impactofthechoiceofcrossvalidationtechniquesontheresultsofmachinelearningbaseddiagnosticapplications AT elmhamdijamal impactofthechoiceofcrossvalidationtechniquesontheresultsofmachinelearningbaseddiagnosticapplications

Impact of the Choice of Cross-Validation Techniques on the Results of Machine Learning-Based Diagnostic Applications

Ejemplares similares