Cargando…
Validation pipeline for machine learning algorithm assessment for multiple vendors
A standardized objective evaluation method is needed to compare machine learning (ML) algorithms as these tools become available for clinical use. Therefore, we designed, built, and tested an evaluation pipeline with the goal of normalizing performance measurement of independently developed algorith...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9053776/ https://www.ncbi.nlm.nih.gov/pubmed/35486572 http://dx.doi.org/10.1371/journal.pone.0267213 |
_version_ | 1784697044585152512 |
---|---|
author | Bizzo, Bernardo C. Ebrahimian, Shadi Walters, Mark E. Michalski, Mark H. Andriole, Katherine P. Dreyer, Keith J. Kalra, Mannudeep K. Alkasab, Tarik Digumarthy, Subba R. |
author_facet | Bizzo, Bernardo C. Ebrahimian, Shadi Walters, Mark E. Michalski, Mark H. Andriole, Katherine P. Dreyer, Keith J. Kalra, Mannudeep K. Alkasab, Tarik Digumarthy, Subba R. |
author_sort | Bizzo, Bernardo C. |
collection | PubMed |
description | A standardized objective evaluation method is needed to compare machine learning (ML) algorithms as these tools become available for clinical use. Therefore, we designed, built, and tested an evaluation pipeline with the goal of normalizing performance measurement of independently developed algorithms, using a common test dataset of our clinical imaging. Three vendor applications for detecting solid, part-solid, and groundglass lung nodules in chest CT examinations were assessed in this retrospective study using our data-preprocessing and algorithm assessment chain. The pipeline included tools for image cohort creation and de-identification; report and image annotation for ground-truth labeling; server partitioning to receive vendor “black box” algorithms and to enable model testing on our internal clinical data (100 chest CTs with 243 nodules) from within our security firewall; model validation and result visualization; and performance assessment calculating algorithm recall, precision, and receiver operating characteristic curves (ROC). Algorithm true positives, false positives, false negatives, recall, and precision for detecting lung nodules were as follows: Vendor-1 (194, 23, 49, 0.80, 0.89); Vendor-2 (182, 270, 61, 0.75, 0.40); Vendor-3 (75, 120, 168, 0.32, 0.39). The AUCs for detection of solid (0.61–0.74), groundglass (0.66–0.86) and part-solid (0.52–0.86) nodules varied between the three vendors. Our ML model validation pipeline enabled testing of multi-vendor algorithms within the institutional firewall. Wide variations in algorithm performance for detection as well as classification of lung nodules justifies the premise for a standardized objective ML algorithm evaluation process. |
format | Online Article Text |
id | pubmed-9053776 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-90537762022-04-30 Validation pipeline for machine learning algorithm assessment for multiple vendors Bizzo, Bernardo C. Ebrahimian, Shadi Walters, Mark E. Michalski, Mark H. Andriole, Katherine P. Dreyer, Keith J. Kalra, Mannudeep K. Alkasab, Tarik Digumarthy, Subba R. PLoS One Research Article A standardized objective evaluation method is needed to compare machine learning (ML) algorithms as these tools become available for clinical use. Therefore, we designed, built, and tested an evaluation pipeline with the goal of normalizing performance measurement of independently developed algorithms, using a common test dataset of our clinical imaging. Three vendor applications for detecting solid, part-solid, and groundglass lung nodules in chest CT examinations were assessed in this retrospective study using our data-preprocessing and algorithm assessment chain. The pipeline included tools for image cohort creation and de-identification; report and image annotation for ground-truth labeling; server partitioning to receive vendor “black box” algorithms and to enable model testing on our internal clinical data (100 chest CTs with 243 nodules) from within our security firewall; model validation and result visualization; and performance assessment calculating algorithm recall, precision, and receiver operating characteristic curves (ROC). Algorithm true positives, false positives, false negatives, recall, and precision for detecting lung nodules were as follows: Vendor-1 (194, 23, 49, 0.80, 0.89); Vendor-2 (182, 270, 61, 0.75, 0.40); Vendor-3 (75, 120, 168, 0.32, 0.39). The AUCs for detection of solid (0.61–0.74), groundglass (0.66–0.86) and part-solid (0.52–0.86) nodules varied between the three vendors. Our ML model validation pipeline enabled testing of multi-vendor algorithms within the institutional firewall. Wide variations in algorithm performance for detection as well as classification of lung nodules justifies the premise for a standardized objective ML algorithm evaluation process. Public Library of Science 2022-04-29 /pmc/articles/PMC9053776/ /pubmed/35486572 http://dx.doi.org/10.1371/journal.pone.0267213 Text en © 2022 Bizzo et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. |
spellingShingle | Research Article Bizzo, Bernardo C. Ebrahimian, Shadi Walters, Mark E. Michalski, Mark H. Andriole, Katherine P. Dreyer, Keith J. Kalra, Mannudeep K. Alkasab, Tarik Digumarthy, Subba R. Validation pipeline for machine learning algorithm assessment for multiple vendors |
title | Validation pipeline for machine learning algorithm assessment for multiple vendors |
title_full | Validation pipeline for machine learning algorithm assessment for multiple vendors |
title_fullStr | Validation pipeline for machine learning algorithm assessment for multiple vendors |
title_full_unstemmed | Validation pipeline for machine learning algorithm assessment for multiple vendors |
title_short | Validation pipeline for machine learning algorithm assessment for multiple vendors |
title_sort | validation pipeline for machine learning algorithm assessment for multiple vendors |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9053776/ https://www.ncbi.nlm.nih.gov/pubmed/35486572 http://dx.doi.org/10.1371/journal.pone.0267213 |
work_keys_str_mv | AT bizzobernardoc validationpipelineformachinelearningalgorithmassessmentformultiplevendors AT ebrahimianshadi validationpipelineformachinelearningalgorithmassessmentformultiplevendors AT waltersmarke validationpipelineformachinelearningalgorithmassessmentformultiplevendors AT michalskimarkh validationpipelineformachinelearningalgorithmassessmentformultiplevendors AT andriolekatherinep validationpipelineformachinelearningalgorithmassessmentformultiplevendors AT dreyerkeithj validationpipelineformachinelearningalgorithmassessmentformultiplevendors AT kalramannudeepk validationpipelineformachinelearningalgorithmassessmentformultiplevendors AT alkasabtarik validationpipelineformachinelearningalgorithmassessmentformultiplevendors AT digumarthysubbar validationpipelineformachinelearningalgorithmassessmentformultiplevendors |