Cargando…

Qualitative Evaluation of Common Quantitative Metrics for Clinical Acceptance of Automatic Segmentation: a Case Study on Heart Contouring from CT Images by Deep Learning Algorithms

Organs-at-risk contouring is time consuming and labour intensive. Automation by deep learning algorithms would decrease the workload of radiotherapists and technicians considerably. However, the variety of metrics used for the evaluation of deep learning algorithms make the results of many papers di...

Descripción completa

Detalles Bibliográficos
Autores principales: van den Oever, L. B., van Veldhuizen, W. A., Cornelissen, L. J., Spoor, D. S., Willems, T. P., Kramer, G., Stigter, T., Rook, M., Crijns, A. P. G., Oudkerk, M., Veldhuis, R. N. J., de Bock, G. H., van Ooijen, P. M. A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8921356/
https://www.ncbi.nlm.nih.gov/pubmed/35083620
http://dx.doi.org/10.1007/s10278-021-00573-9
_version_ 1784669317720178688
author van den Oever, L. B.
van Veldhuizen, W. A.
Cornelissen, L. J.
Spoor, D. S.
Willems, T. P.
Kramer, G.
Stigter, T.
Rook, M.
Crijns, A. P. G.
Oudkerk, M.
Veldhuis, R. N. J.
de Bock, G. H.
van Ooijen, P. M. A.
author_facet van den Oever, L. B.
van Veldhuizen, W. A.
Cornelissen, L. J.
Spoor, D. S.
Willems, T. P.
Kramer, G.
Stigter, T.
Rook, M.
Crijns, A. P. G.
Oudkerk, M.
Veldhuis, R. N. J.
de Bock, G. H.
van Ooijen, P. M. A.
author_sort van den Oever, L. B.
collection PubMed
description Organs-at-risk contouring is time consuming and labour intensive. Automation by deep learning algorithms would decrease the workload of radiotherapists and technicians considerably. However, the variety of metrics used for the evaluation of deep learning algorithms make the results of many papers difficult to interpret and compare. In this paper, a qualitative evaluation is done on five established metrics to assess whether their values correlate with clinical usability. A total of 377 CT volumes with heart delineations were randomly selected for training and evaluation. A deep learning algorithm was used to predict the contours of the heart. A total of 101 CT slices from the validation set with the predicted contours were shown to three experienced radiologists. They examined each slice independently whether they would accept or adjust the prediction and if there were (small) mistakes. For each slice, the scores of this qualitative evaluation were then compared with the Sørensen-Dice coefficient (DC), the Hausdorff distance (HD), pixel-wise accuracy, sensitivity and precision. The statistical analysis of the qualitative evaluation and metrics showed a significant correlation. Of the slices with a DC over 0.96 (N = 20) or a 95% HD under 5 voxels (N = 25), no slices were rejected by the readers. Contours with lower DC or higher HD were seen in both rejected and accepted contours. Qualitative evaluation shows that it is difficult to use common quantification metrics as indicator for use in clinic. We might need to change the reporting of quantitative metrics to better reflect clinical acceptance.
format Online
Article
Text
id pubmed-8921356
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-89213562022-03-25 Qualitative Evaluation of Common Quantitative Metrics for Clinical Acceptance of Automatic Segmentation: a Case Study on Heart Contouring from CT Images by Deep Learning Algorithms van den Oever, L. B. van Veldhuizen, W. A. Cornelissen, L. J. Spoor, D. S. Willems, T. P. Kramer, G. Stigter, T. Rook, M. Crijns, A. P. G. Oudkerk, M. Veldhuis, R. N. J. de Bock, G. H. van Ooijen, P. M. A. J Digit Imaging Article Organs-at-risk contouring is time consuming and labour intensive. Automation by deep learning algorithms would decrease the workload of radiotherapists and technicians considerably. However, the variety of metrics used for the evaluation of deep learning algorithms make the results of many papers difficult to interpret and compare. In this paper, a qualitative evaluation is done on five established metrics to assess whether their values correlate with clinical usability. A total of 377 CT volumes with heart delineations were randomly selected for training and evaluation. A deep learning algorithm was used to predict the contours of the heart. A total of 101 CT slices from the validation set with the predicted contours were shown to three experienced radiologists. They examined each slice independently whether they would accept or adjust the prediction and if there were (small) mistakes. For each slice, the scores of this qualitative evaluation were then compared with the Sørensen-Dice coefficient (DC), the Hausdorff distance (HD), pixel-wise accuracy, sensitivity and precision. The statistical analysis of the qualitative evaluation and metrics showed a significant correlation. Of the slices with a DC over 0.96 (N = 20) or a 95% HD under 5 voxels (N = 25), no slices were rejected by the readers. Contours with lower DC or higher HD were seen in both rejected and accepted contours. Qualitative evaluation shows that it is difficult to use common quantification metrics as indicator for use in clinic. We might need to change the reporting of quantitative metrics to better reflect clinical acceptance. Springer International Publishing 2022-01-26 2022-04 /pmc/articles/PMC8921356/ /pubmed/35083620 http://dx.doi.org/10.1007/s10278-021-00573-9 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
van den Oever, L. B.
van Veldhuizen, W. A.
Cornelissen, L. J.
Spoor, D. S.
Willems, T. P.
Kramer, G.
Stigter, T.
Rook, M.
Crijns, A. P. G.
Oudkerk, M.
Veldhuis, R. N. J.
de Bock, G. H.
van Ooijen, P. M. A.
Qualitative Evaluation of Common Quantitative Metrics for Clinical Acceptance of Automatic Segmentation: a Case Study on Heart Contouring from CT Images by Deep Learning Algorithms
title Qualitative Evaluation of Common Quantitative Metrics for Clinical Acceptance of Automatic Segmentation: a Case Study on Heart Contouring from CT Images by Deep Learning Algorithms
title_full Qualitative Evaluation of Common Quantitative Metrics for Clinical Acceptance of Automatic Segmentation: a Case Study on Heart Contouring from CT Images by Deep Learning Algorithms
title_fullStr Qualitative Evaluation of Common Quantitative Metrics for Clinical Acceptance of Automatic Segmentation: a Case Study on Heart Contouring from CT Images by Deep Learning Algorithms
title_full_unstemmed Qualitative Evaluation of Common Quantitative Metrics for Clinical Acceptance of Automatic Segmentation: a Case Study on Heart Contouring from CT Images by Deep Learning Algorithms
title_short Qualitative Evaluation of Common Quantitative Metrics for Clinical Acceptance of Automatic Segmentation: a Case Study on Heart Contouring from CT Images by Deep Learning Algorithms
title_sort qualitative evaluation of common quantitative metrics for clinical acceptance of automatic segmentation: a case study on heart contouring from ct images by deep learning algorithms
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8921356/
https://www.ncbi.nlm.nih.gov/pubmed/35083620
http://dx.doi.org/10.1007/s10278-021-00573-9
work_keys_str_mv AT vandenoeverlb qualitativeevaluationofcommonquantitativemetricsforclinicalacceptanceofautomaticsegmentationacasestudyonheartcontouringfromctimagesbydeeplearningalgorithms
AT vanveldhuizenwa qualitativeevaluationofcommonquantitativemetricsforclinicalacceptanceofautomaticsegmentationacasestudyonheartcontouringfromctimagesbydeeplearningalgorithms
AT cornelissenlj qualitativeevaluationofcommonquantitativemetricsforclinicalacceptanceofautomaticsegmentationacasestudyonheartcontouringfromctimagesbydeeplearningalgorithms
AT spoords qualitativeevaluationofcommonquantitativemetricsforclinicalacceptanceofautomaticsegmentationacasestudyonheartcontouringfromctimagesbydeeplearningalgorithms
AT willemstp qualitativeevaluationofcommonquantitativemetricsforclinicalacceptanceofautomaticsegmentationacasestudyonheartcontouringfromctimagesbydeeplearningalgorithms
AT kramerg qualitativeevaluationofcommonquantitativemetricsforclinicalacceptanceofautomaticsegmentationacasestudyonheartcontouringfromctimagesbydeeplearningalgorithms
AT stigtert qualitativeevaluationofcommonquantitativemetricsforclinicalacceptanceofautomaticsegmentationacasestudyonheartcontouringfromctimagesbydeeplearningalgorithms
AT rookm qualitativeevaluationofcommonquantitativemetricsforclinicalacceptanceofautomaticsegmentationacasestudyonheartcontouringfromctimagesbydeeplearningalgorithms
AT crijnsapg qualitativeevaluationofcommonquantitativemetricsforclinicalacceptanceofautomaticsegmentationacasestudyonheartcontouringfromctimagesbydeeplearningalgorithms
AT oudkerkm qualitativeevaluationofcommonquantitativemetricsforclinicalacceptanceofautomaticsegmentationacasestudyonheartcontouringfromctimagesbydeeplearningalgorithms
AT veldhuisrnj qualitativeevaluationofcommonquantitativemetricsforclinicalacceptanceofautomaticsegmentationacasestudyonheartcontouringfromctimagesbydeeplearningalgorithms
AT debockgh qualitativeevaluationofcommonquantitativemetricsforclinicalacceptanceofautomaticsegmentationacasestudyonheartcontouringfromctimagesbydeeplearningalgorithms
AT vanooijenpma qualitativeevaluationofcommonquantitativemetricsforclinicalacceptanceofautomaticsegmentationacasestudyonheartcontouringfromctimagesbydeeplearningalgorithms