Cargando…

Learning from multiple annotators for medical image segmentation

Supervised machine learning methods have been widely developed for segmentation tasks in recent years. However, the quality of labels has high impact on the predictive performance of these algorithms. This issue is particularly acute in the medical image domain, where both the cost of annotation and...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Le, Tanno, Ryutaro, Xu, Moucheng, Huang, Yawen, Bronik, Kevin, Jin, Chen, Jacob, Joseph, Zheng, Yefeng, Shao, Ling, Ciccarelli, Olga, Barkhof, Frederik, Alexander, Daniel C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10533416/
https://www.ncbi.nlm.nih.gov/pubmed/37781685
http://dx.doi.org/10.1016/j.patcog.2023.109400
_version_ 1785112187585429504
author Zhang, Le
Tanno, Ryutaro
Xu, Moucheng
Huang, Yawen
Bronik, Kevin
Jin, Chen
Jacob, Joseph
Zheng, Yefeng
Shao, Ling
Ciccarelli, Olga
Barkhof, Frederik
Alexander, Daniel C.
author_facet Zhang, Le
Tanno, Ryutaro
Xu, Moucheng
Huang, Yawen
Bronik, Kevin
Jin, Chen
Jacob, Joseph
Zheng, Yefeng
Shao, Ling
Ciccarelli, Olga
Barkhof, Frederik
Alexander, Daniel C.
author_sort Zhang, Le
collection PubMed
description Supervised machine learning methods have been widely developed for segmentation tasks in recent years. However, the quality of labels has high impact on the predictive performance of these algorithms. This issue is particularly acute in the medical image domain, where both the cost of annotation and the inter-observer variability are high. Different human experts contribute estimates of the ”actual” segmentation labels in a typical label acquisition process, influenced by their personal biases and competency levels. The performance of automatic segmentation algorithms is limited when these noisy labels are used as the expert consensus label. In this work, we use two coupled CNNs to jointly learn, from purely noisy observations alone, the reliability of individual annotators and the expert consensus label distributions. The separation of the two is achieved by maximally describing the annotator’s “unreliable behavior” (we call it “maximally unreliable”) while achieving high fidelity with the noisy training data. We first create a toy segmentation dataset using MNIST and investigate the properties of the proposed algorithm. We then use three public medical imaging segmentation datasets to demonstrate our method’s efficacy, including both simulated (where necessary) and real-world annotations: 1) ISBI2015 (multiple-sclerosis lesions); 2) BraTS (brain tumors); 3) LIDC-IDRI (lung abnormalities). Finally, we create a real-world multiple sclerosis lesion dataset (QSMSC at UCL: Queen Square Multiple Sclerosis Center at UCL, UK) with manual segmentations from 4 different annotators (3 radiologists with different level skills and 1 expert to generate the expert consensus label). In all datasets, our method consistently outperforms competing methods and relevant baselines, especially when the number of annotations is small and the amount of disagreement is large. The studies also reveal that the system is capable of capturing the complicated spatial characteristics of annotators’ mistakes.
format Online
Article
Text
id pubmed-10533416
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-105334162023-09-29 Learning from multiple annotators for medical image segmentation Zhang, Le Tanno, Ryutaro Xu, Moucheng Huang, Yawen Bronik, Kevin Jin, Chen Jacob, Joseph Zheng, Yefeng Shao, Ling Ciccarelli, Olga Barkhof, Frederik Alexander, Daniel C. Pattern Recognit Article Supervised machine learning methods have been widely developed for segmentation tasks in recent years. However, the quality of labels has high impact on the predictive performance of these algorithms. This issue is particularly acute in the medical image domain, where both the cost of annotation and the inter-observer variability are high. Different human experts contribute estimates of the ”actual” segmentation labels in a typical label acquisition process, influenced by their personal biases and competency levels. The performance of automatic segmentation algorithms is limited when these noisy labels are used as the expert consensus label. In this work, we use two coupled CNNs to jointly learn, from purely noisy observations alone, the reliability of individual annotators and the expert consensus label distributions. The separation of the two is achieved by maximally describing the annotator’s “unreliable behavior” (we call it “maximally unreliable”) while achieving high fidelity with the noisy training data. We first create a toy segmentation dataset using MNIST and investigate the properties of the proposed algorithm. We then use three public medical imaging segmentation datasets to demonstrate our method’s efficacy, including both simulated (where necessary) and real-world annotations: 1) ISBI2015 (multiple-sclerosis lesions); 2) BraTS (brain tumors); 3) LIDC-IDRI (lung abnormalities). Finally, we create a real-world multiple sclerosis lesion dataset (QSMSC at UCL: Queen Square Multiple Sclerosis Center at UCL, UK) with manual segmentations from 4 different annotators (3 radiologists with different level skills and 1 expert to generate the expert consensus label). In all datasets, our method consistently outperforms competing methods and relevant baselines, especially when the number of annotations is small and the amount of disagreement is large. The studies also reveal that the system is capable of capturing the complicated spatial characteristics of annotators’ mistakes. Elsevier 2023-06 /pmc/articles/PMC10533416/ /pubmed/37781685 http://dx.doi.org/10.1016/j.patcog.2023.109400 Text en © 2023 The Authors. Published by Elsevier Ltd. https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Zhang, Le
Tanno, Ryutaro
Xu, Moucheng
Huang, Yawen
Bronik, Kevin
Jin, Chen
Jacob, Joseph
Zheng, Yefeng
Shao, Ling
Ciccarelli, Olga
Barkhof, Frederik
Alexander, Daniel C.
Learning from multiple annotators for medical image segmentation
title Learning from multiple annotators for medical image segmentation
title_full Learning from multiple annotators for medical image segmentation
title_fullStr Learning from multiple annotators for medical image segmentation
title_full_unstemmed Learning from multiple annotators for medical image segmentation
title_short Learning from multiple annotators for medical image segmentation
title_sort learning from multiple annotators for medical image segmentation
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10533416/
https://www.ncbi.nlm.nih.gov/pubmed/37781685
http://dx.doi.org/10.1016/j.patcog.2023.109400
work_keys_str_mv AT zhangle learningfrommultipleannotatorsformedicalimagesegmentation
AT tannoryutaro learningfrommultipleannotatorsformedicalimagesegmentation
AT xumoucheng learningfrommultipleannotatorsformedicalimagesegmentation
AT huangyawen learningfrommultipleannotatorsformedicalimagesegmentation
AT bronikkevin learningfrommultipleannotatorsformedicalimagesegmentation
AT jinchen learningfrommultipleannotatorsformedicalimagesegmentation
AT jacobjoseph learningfrommultipleannotatorsformedicalimagesegmentation
AT zhengyefeng learningfrommultipleannotatorsformedicalimagesegmentation
AT shaoling learningfrommultipleannotatorsformedicalimagesegmentation
AT ciccarelliolga learningfrommultipleannotatorsformedicalimagesegmentation
AT barkhoffrederik learningfrommultipleannotatorsformedicalimagesegmentation
AT alexanderdanielc learningfrommultipleannotatorsformedicalimagesegmentation