Cargando…
Learning from multiple annotators for medical image segmentation
Supervised machine learning methods have been widely developed for segmentation tasks in recent years. However, the quality of labels has high impact on the predictive performance of these algorithms. This issue is particularly acute in the medical image domain, where both the cost of annotation and...
Autores principales: | , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Elsevier
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10533416/ https://www.ncbi.nlm.nih.gov/pubmed/37781685 http://dx.doi.org/10.1016/j.patcog.2023.109400 |
_version_ | 1785112187585429504 |
---|---|
author | Zhang, Le Tanno, Ryutaro Xu, Moucheng Huang, Yawen Bronik, Kevin Jin, Chen Jacob, Joseph Zheng, Yefeng Shao, Ling Ciccarelli, Olga Barkhof, Frederik Alexander, Daniel C. |
author_facet | Zhang, Le Tanno, Ryutaro Xu, Moucheng Huang, Yawen Bronik, Kevin Jin, Chen Jacob, Joseph Zheng, Yefeng Shao, Ling Ciccarelli, Olga Barkhof, Frederik Alexander, Daniel C. |
author_sort | Zhang, Le |
collection | PubMed |
description | Supervised machine learning methods have been widely developed for segmentation tasks in recent years. However, the quality of labels has high impact on the predictive performance of these algorithms. This issue is particularly acute in the medical image domain, where both the cost of annotation and the inter-observer variability are high. Different human experts contribute estimates of the ”actual” segmentation labels in a typical label acquisition process, influenced by their personal biases and competency levels. The performance of automatic segmentation algorithms is limited when these noisy labels are used as the expert consensus label. In this work, we use two coupled CNNs to jointly learn, from purely noisy observations alone, the reliability of individual annotators and the expert consensus label distributions. The separation of the two is achieved by maximally describing the annotator’s “unreliable behavior” (we call it “maximally unreliable”) while achieving high fidelity with the noisy training data. We first create a toy segmentation dataset using MNIST and investigate the properties of the proposed algorithm. We then use three public medical imaging segmentation datasets to demonstrate our method’s efficacy, including both simulated (where necessary) and real-world annotations: 1) ISBI2015 (multiple-sclerosis lesions); 2) BraTS (brain tumors); 3) LIDC-IDRI (lung abnormalities). Finally, we create a real-world multiple sclerosis lesion dataset (QSMSC at UCL: Queen Square Multiple Sclerosis Center at UCL, UK) with manual segmentations from 4 different annotators (3 radiologists with different level skills and 1 expert to generate the expert consensus label). In all datasets, our method consistently outperforms competing methods and relevant baselines, especially when the number of annotations is small and the amount of disagreement is large. The studies also reveal that the system is capable of capturing the complicated spatial characteristics of annotators’ mistakes. |
format | Online Article Text |
id | pubmed-10533416 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Elsevier |
record_format | MEDLINE/PubMed |
spelling | pubmed-105334162023-09-29 Learning from multiple annotators for medical image segmentation Zhang, Le Tanno, Ryutaro Xu, Moucheng Huang, Yawen Bronik, Kevin Jin, Chen Jacob, Joseph Zheng, Yefeng Shao, Ling Ciccarelli, Olga Barkhof, Frederik Alexander, Daniel C. Pattern Recognit Article Supervised machine learning methods have been widely developed for segmentation tasks in recent years. However, the quality of labels has high impact on the predictive performance of these algorithms. This issue is particularly acute in the medical image domain, where both the cost of annotation and the inter-observer variability are high. Different human experts contribute estimates of the ”actual” segmentation labels in a typical label acquisition process, influenced by their personal biases and competency levels. The performance of automatic segmentation algorithms is limited when these noisy labels are used as the expert consensus label. In this work, we use two coupled CNNs to jointly learn, from purely noisy observations alone, the reliability of individual annotators and the expert consensus label distributions. The separation of the two is achieved by maximally describing the annotator’s “unreliable behavior” (we call it “maximally unreliable”) while achieving high fidelity with the noisy training data. We first create a toy segmentation dataset using MNIST and investigate the properties of the proposed algorithm. We then use three public medical imaging segmentation datasets to demonstrate our method’s efficacy, including both simulated (where necessary) and real-world annotations: 1) ISBI2015 (multiple-sclerosis lesions); 2) BraTS (brain tumors); 3) LIDC-IDRI (lung abnormalities). Finally, we create a real-world multiple sclerosis lesion dataset (QSMSC at UCL: Queen Square Multiple Sclerosis Center at UCL, UK) with manual segmentations from 4 different annotators (3 radiologists with different level skills and 1 expert to generate the expert consensus label). In all datasets, our method consistently outperforms competing methods and relevant baselines, especially when the number of annotations is small and the amount of disagreement is large. The studies also reveal that the system is capable of capturing the complicated spatial characteristics of annotators’ mistakes. Elsevier 2023-06 /pmc/articles/PMC10533416/ /pubmed/37781685 http://dx.doi.org/10.1016/j.patcog.2023.109400 Text en © 2023 The Authors. Published by Elsevier Ltd. https://creativecommons.org/licenses/by/4.0/This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Zhang, Le Tanno, Ryutaro Xu, Moucheng Huang, Yawen Bronik, Kevin Jin, Chen Jacob, Joseph Zheng, Yefeng Shao, Ling Ciccarelli, Olga Barkhof, Frederik Alexander, Daniel C. Learning from multiple annotators for medical image segmentation |
title | Learning from multiple annotators for medical image segmentation |
title_full | Learning from multiple annotators for medical image segmentation |
title_fullStr | Learning from multiple annotators for medical image segmentation |
title_full_unstemmed | Learning from multiple annotators for medical image segmentation |
title_short | Learning from multiple annotators for medical image segmentation |
title_sort | learning from multiple annotators for medical image segmentation |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10533416/ https://www.ncbi.nlm.nih.gov/pubmed/37781685 http://dx.doi.org/10.1016/j.patcog.2023.109400 |
work_keys_str_mv | AT zhangle learningfrommultipleannotatorsformedicalimagesegmentation AT tannoryutaro learningfrommultipleannotatorsformedicalimagesegmentation AT xumoucheng learningfrommultipleannotatorsformedicalimagesegmentation AT huangyawen learningfrommultipleannotatorsformedicalimagesegmentation AT bronikkevin learningfrommultipleannotatorsformedicalimagesegmentation AT jinchen learningfrommultipleannotatorsformedicalimagesegmentation AT jacobjoseph learningfrommultipleannotatorsformedicalimagesegmentation AT zhengyefeng learningfrommultipleannotatorsformedicalimagesegmentation AT shaoling learningfrommultipleannotatorsformedicalimagesegmentation AT ciccarelliolga learningfrommultipleannotatorsformedicalimagesegmentation AT barkhoffrederik learningfrommultipleannotatorsformedicalimagesegmentation AT alexanderdanielc learningfrommultipleannotatorsformedicalimagesegmentation |