Cargando…

Inconsistency in the use of the term “validation” in studies reporting the performance of deep learning algorithms in providing diagnosis from medical imaging

BACKGROUND: The development of deep learning (DL) algorithms is a three-step process—training, tuning, and testing. Studies are inconsistent in the use of the term “validation”, with some using it to refer to tuning and others testing, which hinders accurate delivery of information and may inadverte...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kim, Dong Wook, Jang, Hye Young, Ko, Yousun, Son, Jung Hee, Kim, Pyeong Hwa, Kim, Seon-Ok, Lim, Joon Seo, Park, Seong Ho
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2020
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7485764/ https://www.ncbi.nlm.nih.gov/pubmed/32915901 http://dx.doi.org/10.1371/journal.pone.0238908

_version_	1783581209825116160
author	Kim, Dong Wook Jang, Hye Young Ko, Yousun Son, Jung Hee Kim, Pyeong Hwa Kim, Seon-Ok Lim, Joon Seo Park, Seong Ho
author_facet	Kim, Dong Wook Jang, Hye Young Ko, Yousun Son, Jung Hee Kim, Pyeong Hwa Kim, Seon-Ok Lim, Joon Seo Park, Seong Ho
author_sort	Kim, Dong Wook
collection	PubMed
description	BACKGROUND: The development of deep learning (DL) algorithms is a three-step process—training, tuning, and testing. Studies are inconsistent in the use of the term “validation”, with some using it to refer to tuning and others testing, which hinders accurate delivery of information and may inadvertently exaggerate the performance of DL algorithms. We investigated the extent of inconsistency in usage of the term “validation” in studies on the accuracy of DL algorithms in providing diagnosis from medical imaging. METHODS AND FINDINGS: We analyzed the full texts of research papers cited in two recent systematic reviews. The papers were categorized according to whether the term “validation” was used to refer to tuning alone, both tuning and testing, or testing alone. We analyzed whether paper characteristics (i.e., journal category, field of study, year of print publication, journal impact factor [JIF], and nature of test data) were associated with the usage of the terminology using multivariable logistic regression analysis with generalized estimating equations. Of 201 papers published in 125 journals, 118 (58.7%), 9 (4.5%), and 74 (36.8%) used the term to refer to tuning alone, both tuning and testing, and testing alone, respectively. A weak association was noted between higher JIF and using the term to refer to testing (i.e., testing alone or both tuning and testing) instead of tuning alone (vs. JIF <5; JIF 5 to 10: adjusted odds ratio 2.11, P = 0.042; JIF >10: adjusted odds ratio 2.41, P = 0.089). Journal category, field of study, year of print publication, and nature of test data were not significantly associated with the terminology usage. CONCLUSIONS: Existing literature has a significant degree of inconsistency in using the term “validation” when referring to the steps in DL algorithm development. Efforts are needed to improve the accuracy and clarity in the terminology usage.
format	Online Article Text
id	pubmed-7485764
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-74857642020-09-21 Inconsistency in the use of the term “validation” in studies reporting the performance of deep learning algorithms in providing diagnosis from medical imaging Kim, Dong Wook Jang, Hye Young Ko, Yousun Son, Jung Hee Kim, Pyeong Hwa Kim, Seon-Ok Lim, Joon Seo Park, Seong Ho PLoS One Research Article BACKGROUND: The development of deep learning (DL) algorithms is a three-step process—training, tuning, and testing. Studies are inconsistent in the use of the term “validation”, with some using it to refer to tuning and others testing, which hinders accurate delivery of information and may inadvertently exaggerate the performance of DL algorithms. We investigated the extent of inconsistency in usage of the term “validation” in studies on the accuracy of DL algorithms in providing diagnosis from medical imaging. METHODS AND FINDINGS: We analyzed the full texts of research papers cited in two recent systematic reviews. The papers were categorized according to whether the term “validation” was used to refer to tuning alone, both tuning and testing, or testing alone. We analyzed whether paper characteristics (i.e., journal category, field of study, year of print publication, journal impact factor [JIF], and nature of test data) were associated with the usage of the terminology using multivariable logistic regression analysis with generalized estimating equations. Of 201 papers published in 125 journals, 118 (58.7%), 9 (4.5%), and 74 (36.8%) used the term to refer to tuning alone, both tuning and testing, and testing alone, respectively. A weak association was noted between higher JIF and using the term to refer to testing (i.e., testing alone or both tuning and testing) instead of tuning alone (vs. JIF <5; JIF 5 to 10: adjusted odds ratio 2.11, P = 0.042; JIF >10: adjusted odds ratio 2.41, P = 0.089). Journal category, field of study, year of print publication, and nature of test data were not significantly associated with the terminology usage. CONCLUSIONS: Existing literature has a significant degree of inconsistency in using the term “validation” when referring to the steps in DL algorithm development. Efforts are needed to improve the accuracy and clarity in the terminology usage. Public Library of Science 2020-09-11 /pmc/articles/PMC7485764/ /pubmed/32915901 http://dx.doi.org/10.1371/journal.pone.0238908 Text en © 2020 Kim et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Kim, Dong Wook Jang, Hye Young Ko, Yousun Son, Jung Hee Kim, Pyeong Hwa Kim, Seon-Ok Lim, Joon Seo Park, Seong Ho Inconsistency in the use of the term “validation” in studies reporting the performance of deep learning algorithms in providing diagnosis from medical imaging
title	Inconsistency in the use of the term “validation” in studies reporting the performance of deep learning algorithms in providing diagnosis from medical imaging
title_full	Inconsistency in the use of the term “validation” in studies reporting the performance of deep learning algorithms in providing diagnosis from medical imaging
title_fullStr	Inconsistency in the use of the term “validation” in studies reporting the performance of deep learning algorithms in providing diagnosis from medical imaging
title_full_unstemmed	Inconsistency in the use of the term “validation” in studies reporting the performance of deep learning algorithms in providing diagnosis from medical imaging
title_short	Inconsistency in the use of the term “validation” in studies reporting the performance of deep learning algorithms in providing diagnosis from medical imaging
title_sort	inconsistency in the use of the term “validation” in studies reporting the performance of deep learning algorithms in providing diagnosis from medical imaging
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7485764/ https://www.ncbi.nlm.nih.gov/pubmed/32915901 http://dx.doi.org/10.1371/journal.pone.0238908
work_keys_str_mv	AT kimdongwook inconsistencyintheuseofthetermvalidationinstudiesreportingtheperformanceofdeeplearningalgorithmsinprovidingdiagnosisfrommedicalimaging AT janghyeyoung inconsistencyintheuseofthetermvalidationinstudiesreportingtheperformanceofdeeplearningalgorithmsinprovidingdiagnosisfrommedicalimaging AT koyousun inconsistencyintheuseofthetermvalidationinstudiesreportingtheperformanceofdeeplearningalgorithmsinprovidingdiagnosisfrommedicalimaging AT sonjunghee inconsistencyintheuseofthetermvalidationinstudiesreportingtheperformanceofdeeplearningalgorithmsinprovidingdiagnosisfrommedicalimaging AT kimpyeonghwa inconsistencyintheuseofthetermvalidationinstudiesreportingtheperformanceofdeeplearningalgorithmsinprovidingdiagnosisfrommedicalimaging AT kimseonok inconsistencyintheuseofthetermvalidationinstudiesreportingtheperformanceofdeeplearningalgorithmsinprovidingdiagnosisfrommedicalimaging AT limjoonseo inconsistencyintheuseofthetermvalidationinstudiesreportingtheperformanceofdeeplearningalgorithmsinprovidingdiagnosisfrommedicalimaging AT parkseongho inconsistencyintheuseofthetermvalidationinstudiesreportingtheperformanceofdeeplearningalgorithmsinprovidingdiagnosisfrommedicalimaging

Inconsistency in the use of the term “validation” in studies reporting the performance of deep learning algorithms in providing diagnosis from medical imaging

Ejemplares similares