Cargando…

Agreement Between Experts and an Untrained Crowd for Identifying Dermoscopic Features Using a Gamified App: Reader Feasibility Study

BACKGROUND: Dermoscopy is commonly used for the evaluation of pigmented lesions, but agreement between experts for identification of dermoscopic structures is known to be relatively poor. Expert labeling of medical data is a bottleneck in the development of machine learning (ML) tools, and crowdsour...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kentley, Jonathan, Weber, Jochen, Liopyris, Konstantinos, Braun, Ralph P, Marghoob, Ashfaq A, Quigley, Elizabeth A, Nelson, Kelly, Prentice, Kira, Duhaime, Erik, Halpern, Allan C, Rotemberg, Veronica
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	JMIR Publications 2023
Materias:	Original Paper
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9892985/ https://www.ncbi.nlm.nih.gov/pubmed/36652282 http://dx.doi.org/10.2196/38412

_version_	1784881430598254592
author	Kentley, Jonathan Weber, Jochen Liopyris, Konstantinos Braun, Ralph P Marghoob, Ashfaq A Quigley, Elizabeth A Nelson, Kelly Prentice, Kira Duhaime, Erik Halpern, Allan C Rotemberg, Veronica
author_facet	Kentley, Jonathan Weber, Jochen Liopyris, Konstantinos Braun, Ralph P Marghoob, Ashfaq A Quigley, Elizabeth A Nelson, Kelly Prentice, Kira Duhaime, Erik Halpern, Allan C Rotemberg, Veronica
author_sort	Kentley, Jonathan
collection	PubMed
description	BACKGROUND: Dermoscopy is commonly used for the evaluation of pigmented lesions, but agreement between experts for identification of dermoscopic structures is known to be relatively poor. Expert labeling of medical data is a bottleneck in the development of machine learning (ML) tools, and crowdsourcing has been demonstrated as a cost- and time-efficient method for the annotation of medical images. OBJECTIVE: The aim of this study is to demonstrate that crowdsourcing can be used to label basic dermoscopic structures from images of pigmented lesions with similar reliability to a group of experts. METHODS: First, we obtained labels of 248 images of melanocytic lesions with 31 dermoscopic “subfeatures” labeled by 20 dermoscopy experts. These were then collapsed into 6 dermoscopic “superfeatures” based on structural similarity, due to low interrater reliability (IRR): dots, globules, lines, network structures, regression structures, and vessels. These images were then used as the gold standard for the crowd study. The commercial platform DiagnosUs was used to obtain annotations from a nonexpert crowd for the presence or absence of the 6 superfeatures in each of the 248 images. We replicated this methodology with a group of 7 dermatologists to allow direct comparison with the nonexpert crowd. The Cohen κ value was used to measure agreement across raters. RESULTS: In total, we obtained 139,731 ratings of the 6 dermoscopic superfeatures from the crowd. There was relatively lower agreement for the identification of dots and globules (the median κ values were 0.526 and 0.395, respectively), whereas network structures and vessels showed the highest agreement (the median κ values were 0.581 and 0.798, respectively). This pattern was also seen among the expert raters, who had median κ values of 0.483 and 0.517 for dots and globules, respectively, and 0.758 and 0.790 for network structures and vessels. The median κ values between nonexperts and thresholded average–expert readers were 0.709 for dots, 0.719 for globules, 0.714 for lines, 0.838 for network structures, 0.818 for regression structures, and 0.728 for vessels. CONCLUSIONS: This study confirmed that IRR for different dermoscopic features varied among a group of experts; a similar pattern was observed in a nonexpert crowd. There was good or excellent agreement for each of the 6 superfeatures between the crowd and the experts, highlighting the similar reliability of the crowd for labeling dermoscopic images. This confirms the feasibility and dependability of using crowdsourcing as a scalable solution to annotate large sets of dermoscopic images, with several potential clinical and educational applications, including the development of novel, explainable ML tools.
format	Online Article Text
id	pubmed-9892985
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	JMIR Publications
record_format	MEDLINE/PubMed
spelling	pubmed-98929852023-02-03 Agreement Between Experts and an Untrained Crowd for Identifying Dermoscopic Features Using a Gamified App: Reader Feasibility Study Kentley, Jonathan Weber, Jochen Liopyris, Konstantinos Braun, Ralph P Marghoob, Ashfaq A Quigley, Elizabeth A Nelson, Kelly Prentice, Kira Duhaime, Erik Halpern, Allan C Rotemberg, Veronica JMIR Med Inform Original Paper BACKGROUND: Dermoscopy is commonly used for the evaluation of pigmented lesions, but agreement between experts for identification of dermoscopic structures is known to be relatively poor. Expert labeling of medical data is a bottleneck in the development of machine learning (ML) tools, and crowdsourcing has been demonstrated as a cost- and time-efficient method for the annotation of medical images. OBJECTIVE: The aim of this study is to demonstrate that crowdsourcing can be used to label basic dermoscopic structures from images of pigmented lesions with similar reliability to a group of experts. METHODS: First, we obtained labels of 248 images of melanocytic lesions with 31 dermoscopic “subfeatures” labeled by 20 dermoscopy experts. These were then collapsed into 6 dermoscopic “superfeatures” based on structural similarity, due to low interrater reliability (IRR): dots, globules, lines, network structures, regression structures, and vessels. These images were then used as the gold standard for the crowd study. The commercial platform DiagnosUs was used to obtain annotations from a nonexpert crowd for the presence or absence of the 6 superfeatures in each of the 248 images. We replicated this methodology with a group of 7 dermatologists to allow direct comparison with the nonexpert crowd. The Cohen κ value was used to measure agreement across raters. RESULTS: In total, we obtained 139,731 ratings of the 6 dermoscopic superfeatures from the crowd. There was relatively lower agreement for the identification of dots and globules (the median κ values were 0.526 and 0.395, respectively), whereas network structures and vessels showed the highest agreement (the median κ values were 0.581 and 0.798, respectively). This pattern was also seen among the expert raters, who had median κ values of 0.483 and 0.517 for dots and globules, respectively, and 0.758 and 0.790 for network structures and vessels. The median κ values between nonexperts and thresholded average–expert readers were 0.709 for dots, 0.719 for globules, 0.714 for lines, 0.838 for network structures, 0.818 for regression structures, and 0.728 for vessels. CONCLUSIONS: This study confirmed that IRR for different dermoscopic features varied among a group of experts; a similar pattern was observed in a nonexpert crowd. There was good or excellent agreement for each of the 6 superfeatures between the crowd and the experts, highlighting the similar reliability of the crowd for labeling dermoscopic images. This confirms the feasibility and dependability of using crowdsourcing as a scalable solution to annotate large sets of dermoscopic images, with several potential clinical and educational applications, including the development of novel, explainable ML tools. JMIR Publications 2023-01-18 /pmc/articles/PMC9892985/ /pubmed/36652282 http://dx.doi.org/10.2196/38412 Text en ©Jonathan Kentley, Jochen Weber, Konstantinos Liopyris, Ralph P Braun, Ashfaq A Marghoob, Elizabeth A Quigley, Kelly Nelson, Kira Prentice, Erik Duhaime, Allan C Halpern, Veronica Rotemberg. Originally published in JMIR Medical Informatics (https://medinform.jmir.org), 18.01.2023. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in JMIR Medical Informatics, is properly cited. The complete bibliographic information, a link to the original publication on https://medinform.jmir.org/, as well as this copyright and license information must be included.
spellingShingle	Original Paper Kentley, Jonathan Weber, Jochen Liopyris, Konstantinos Braun, Ralph P Marghoob, Ashfaq A Quigley, Elizabeth A Nelson, Kelly Prentice, Kira Duhaime, Erik Halpern, Allan C Rotemberg, Veronica Agreement Between Experts and an Untrained Crowd for Identifying Dermoscopic Features Using a Gamified App: Reader Feasibility Study
title	Agreement Between Experts and an Untrained Crowd for Identifying Dermoscopic Features Using a Gamified App: Reader Feasibility Study
title_full	Agreement Between Experts and an Untrained Crowd for Identifying Dermoscopic Features Using a Gamified App: Reader Feasibility Study
title_fullStr	Agreement Between Experts and an Untrained Crowd for Identifying Dermoscopic Features Using a Gamified App: Reader Feasibility Study
title_full_unstemmed	Agreement Between Experts and an Untrained Crowd for Identifying Dermoscopic Features Using a Gamified App: Reader Feasibility Study
title_short	Agreement Between Experts and an Untrained Crowd for Identifying Dermoscopic Features Using a Gamified App: Reader Feasibility Study
title_sort	agreement between experts and an untrained crowd for identifying dermoscopic features using a gamified app: reader feasibility study
topic	Original Paper
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9892985/ https://www.ncbi.nlm.nih.gov/pubmed/36652282 http://dx.doi.org/10.2196/38412
work_keys_str_mv	AT kentleyjonathan agreementbetweenexpertsandanuntrainedcrowdforidentifyingdermoscopicfeaturesusingagamifiedappreaderfeasibilitystudy AT weberjochen agreementbetweenexpertsandanuntrainedcrowdforidentifyingdermoscopicfeaturesusingagamifiedappreaderfeasibilitystudy AT liopyriskonstantinos agreementbetweenexpertsandanuntrainedcrowdforidentifyingdermoscopicfeaturesusingagamifiedappreaderfeasibilitystudy AT braunralphp agreementbetweenexpertsandanuntrainedcrowdforidentifyingdermoscopicfeaturesusingagamifiedappreaderfeasibilitystudy AT marghoobashfaqa agreementbetweenexpertsandanuntrainedcrowdforidentifyingdermoscopicfeaturesusingagamifiedappreaderfeasibilitystudy AT quigleyelizabetha agreementbetweenexpertsandanuntrainedcrowdforidentifyingdermoscopicfeaturesusingagamifiedappreaderfeasibilitystudy AT nelsonkelly agreementbetweenexpertsandanuntrainedcrowdforidentifyingdermoscopicfeaturesusingagamifiedappreaderfeasibilitystudy AT prenticekira agreementbetweenexpertsandanuntrainedcrowdforidentifyingdermoscopicfeaturesusingagamifiedappreaderfeasibilitystudy AT duhaimeerik agreementbetweenexpertsandanuntrainedcrowdforidentifyingdermoscopicfeaturesusingagamifiedappreaderfeasibilitystudy AT halpernallanc agreementbetweenexpertsandanuntrainedcrowdforidentifyingdermoscopicfeaturesusingagamifiedappreaderfeasibilitystudy AT rotembergveronica agreementbetweenexpertsandanuntrainedcrowdforidentifyingdermoscopicfeaturesusingagamifiedappreaderfeasibilitystudy

Agreement Between Experts and an Untrained Crowd for Identifying Dermoscopic Features Using a Gamified App: Reader Feasibility Study

Ejemplares similares