Cargando…

Validation pipeline for machine learning algorithm assessment for multiple vendors

A standardized objective evaluation method is needed to compare machine learning (ML) algorithms as these tools become available for clinical use. Therefore, we designed, built, and tested an evaluation pipeline with the goal of normalizing performance measurement of independently developed algorith...

Descripción completa

Detalles Bibliográficos
Autores principales:	Bizzo, Bernardo C., Ebrahimian, Shadi, Walters, Mark E., Michalski, Mark H., Andriole, Katherine P., Dreyer, Keith J., Kalra, Mannudeep K., Alkasab, Tarik, Digumarthy, Subba R.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2022
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9053776/ https://www.ncbi.nlm.nih.gov/pubmed/35486572 http://dx.doi.org/10.1371/journal.pone.0267213

_version_	1784697044585152512
author	Bizzo, Bernardo C. Ebrahimian, Shadi Walters, Mark E. Michalski, Mark H. Andriole, Katherine P. Dreyer, Keith J. Kalra, Mannudeep K. Alkasab, Tarik Digumarthy, Subba R.
author_facet	Bizzo, Bernardo C. Ebrahimian, Shadi Walters, Mark E. Michalski, Mark H. Andriole, Katherine P. Dreyer, Keith J. Kalra, Mannudeep K. Alkasab, Tarik Digumarthy, Subba R.
author_sort	Bizzo, Bernardo C.
collection	PubMed
description	A standardized objective evaluation method is needed to compare machine learning (ML) algorithms as these tools become available for clinical use. Therefore, we designed, built, and tested an evaluation pipeline with the goal of normalizing performance measurement of independently developed algorithms, using a common test dataset of our clinical imaging. Three vendor applications for detecting solid, part-solid, and groundglass lung nodules in chest CT examinations were assessed in this retrospective study using our data-preprocessing and algorithm assessment chain. The pipeline included tools for image cohort creation and de-identification; report and image annotation for ground-truth labeling; server partitioning to receive vendor “black box” algorithms and to enable model testing on our internal clinical data (100 chest CTs with 243 nodules) from within our security firewall; model validation and result visualization; and performance assessment calculating algorithm recall, precision, and receiver operating characteristic curves (ROC). Algorithm true positives, false positives, false negatives, recall, and precision for detecting lung nodules were as follows: Vendor-1 (194, 23, 49, 0.80, 0.89); Vendor-2 (182, 270, 61, 0.75, 0.40); Vendor-3 (75, 120, 168, 0.32, 0.39). The AUCs for detection of solid (0.61–0.74), groundglass (0.66–0.86) and part-solid (0.52–0.86) nodules varied between the three vendors. Our ML model validation pipeline enabled testing of multi-vendor algorithms within the institutional firewall. Wide variations in algorithm performance for detection as well as classification of lung nodules justifies the premise for a standardized objective ML algorithm evaluation process.
format	Online Article Text
id	pubmed-9053776
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-90537762022-04-30 Validation pipeline for machine learning algorithm assessment for multiple vendors Bizzo, Bernardo C. Ebrahimian, Shadi Walters, Mark E. Michalski, Mark H. Andriole, Katherine P. Dreyer, Keith J. Kalra, Mannudeep K. Alkasab, Tarik Digumarthy, Subba R. PLoS One Research Article A standardized objective evaluation method is needed to compare machine learning (ML) algorithms as these tools become available for clinical use. Therefore, we designed, built, and tested an evaluation pipeline with the goal of normalizing performance measurement of independently developed algorithms, using a common test dataset of our clinical imaging. Three vendor applications for detecting solid, part-solid, and groundglass lung nodules in chest CT examinations were assessed in this retrospective study using our data-preprocessing and algorithm assessment chain. The pipeline included tools for image cohort creation and de-identification; report and image annotation for ground-truth labeling; server partitioning to receive vendor “black box” algorithms and to enable model testing on our internal clinical data (100 chest CTs with 243 nodules) from within our security firewall; model validation and result visualization; and performance assessment calculating algorithm recall, precision, and receiver operating characteristic curves (ROC). Algorithm true positives, false positives, false negatives, recall, and precision for detecting lung nodules were as follows: Vendor-1 (194, 23, 49, 0.80, 0.89); Vendor-2 (182, 270, 61, 0.75, 0.40); Vendor-3 (75, 120, 168, 0.32, 0.39). The AUCs for detection of solid (0.61–0.74), groundglass (0.66–0.86) and part-solid (0.52–0.86) nodules varied between the three vendors. Our ML model validation pipeline enabled testing of multi-vendor algorithms within the institutional firewall. Wide variations in algorithm performance for detection as well as classification of lung nodules justifies the premise for a standardized objective ML algorithm evaluation process. Public Library of Science 2022-04-29 /pmc/articles/PMC9053776/ /pubmed/35486572 http://dx.doi.org/10.1371/journal.pone.0267213 Text en © 2022 Bizzo et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Bizzo, Bernardo C. Ebrahimian, Shadi Walters, Mark E. Michalski, Mark H. Andriole, Katherine P. Dreyer, Keith J. Kalra, Mannudeep K. Alkasab, Tarik Digumarthy, Subba R. Validation pipeline for machine learning algorithm assessment for multiple vendors
title	Validation pipeline for machine learning algorithm assessment for multiple vendors
title_full	Validation pipeline for machine learning algorithm assessment for multiple vendors
title_fullStr	Validation pipeline for machine learning algorithm assessment for multiple vendors
title_full_unstemmed	Validation pipeline for machine learning algorithm assessment for multiple vendors
title_short	Validation pipeline for machine learning algorithm assessment for multiple vendors
title_sort	validation pipeline for machine learning algorithm assessment for multiple vendors
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9053776/ https://www.ncbi.nlm.nih.gov/pubmed/35486572 http://dx.doi.org/10.1371/journal.pone.0267213
work_keys_str_mv	AT bizzobernardoc validationpipelineformachinelearningalgorithmassessmentformultiplevendors AT ebrahimianshadi validationpipelineformachinelearningalgorithmassessmentformultiplevendors AT waltersmarke validationpipelineformachinelearningalgorithmassessmentformultiplevendors AT michalskimarkh validationpipelineformachinelearningalgorithmassessmentformultiplevendors AT andriolekatherinep validationpipelineformachinelearningalgorithmassessmentformultiplevendors AT dreyerkeithj validationpipelineformachinelearningalgorithmassessmentformultiplevendors AT kalramannudeepk validationpipelineformachinelearningalgorithmassessmentformultiplevendors AT alkasabtarik validationpipelineformachinelearningalgorithmassessmentformultiplevendors AT digumarthysubbar validationpipelineformachinelearningalgorithmassessmentformultiplevendors

Validation pipeline for machine learning algorithm assessment for multiple vendors

Ejemplares similares