Cargando…

Quantifying acceptable artefact ranges for dermatologic classification algorithms

BACKGROUND: Many classifiers have been developed that can distinguish different types of skin lesions (e.g., benign nevi, melanoma) with varying degrees of success.(1–5) However, even successfully trained classifiers may perform poorly on images that include artefacts. While problems created by hair...

Descripción completa

Detalles Bibliográficos
Autores principales: Petrie, T.C., Larson, C., Heath, M., Samatham, R., Davis, A., Berry, E.G., Leachman, S.A.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley and Sons Inc. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9060017/
https://www.ncbi.nlm.nih.gov/pubmed/35664971
http://dx.doi.org/10.1002/ski2.19
_version_ 1784698429038919680
author Petrie, T.C.
Larson, C.
Heath, M.
Samatham, R.
Davis, A.
Berry, E.G.
Leachman, S.A.
author_facet Petrie, T.C.
Larson, C.
Heath, M.
Samatham, R.
Davis, A.
Berry, E.G.
Leachman, S.A.
author_sort Petrie, T.C.
collection PubMed
description BACKGROUND: Many classifiers have been developed that can distinguish different types of skin lesions (e.g., benign nevi, melanoma) with varying degrees of success.(1–5) However, even successfully trained classifiers may perform poorly on images that include artefacts. While problems created by hair and ink markings have been published, quantitative measurements of blur, colour and lighting variations on classification accuracy has not yet been reported to our knowledge. OBJECTIVES: We created a system that measures the impact of various artefacts on machine learning accuracy. Our objectives were to (1) quantitatively identify the most egregious artefacts and (2) demonstrate how to assess a classification algorithm's accuracy when input images include artefacts. METHODS: We injected artefacts into dermatologic images using techniques that could be controlled with a single variable. This allows us to quantitatively evaluate the impact on the accuracy. We trained two convolutional neural networks on two different binary classification tasks and measured the impact on dermoscopy images over a range of parameter values. The area under the curve and specificity‐at‐a‐given‐sensitivity values were measured for each artefact induced at each parameter. RESULTS: General blur had the strongest negative effect on the melanoma versus other task. Conversely, shifting the hue towards blue had a more pronounced effect on the suspicious versus follow task. CONCLUSIONS: Classifiers should either mitigate artefacts or detect them. Images should be excluded from diagnosis/recommendation when artefacts are present in amounts outside the machine perceived quality range. Failure to do so will reduce accuracy and impede approval from regulatory agencies.
format Online
Article
Text
id pubmed-9060017
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher John Wiley and Sons Inc.
record_format MEDLINE/PubMed
spelling pubmed-90600172022-06-04 Quantifying acceptable artefact ranges for dermatologic classification algorithms Petrie, T.C. Larson, C. Heath, M. Samatham, R. Davis, A. Berry, E.G. Leachman, S.A. Skin Health Dis Original Articles BACKGROUND: Many classifiers have been developed that can distinguish different types of skin lesions (e.g., benign nevi, melanoma) with varying degrees of success.(1–5) However, even successfully trained classifiers may perform poorly on images that include artefacts. While problems created by hair and ink markings have been published, quantitative measurements of blur, colour and lighting variations on classification accuracy has not yet been reported to our knowledge. OBJECTIVES: We created a system that measures the impact of various artefacts on machine learning accuracy. Our objectives were to (1) quantitatively identify the most egregious artefacts and (2) demonstrate how to assess a classification algorithm's accuracy when input images include artefacts. METHODS: We injected artefacts into dermatologic images using techniques that could be controlled with a single variable. This allows us to quantitatively evaluate the impact on the accuracy. We trained two convolutional neural networks on two different binary classification tasks and measured the impact on dermoscopy images over a range of parameter values. The area under the curve and specificity‐at‐a‐given‐sensitivity values were measured for each artefact induced at each parameter. RESULTS: General blur had the strongest negative effect on the melanoma versus other task. Conversely, shifting the hue towards blue had a more pronounced effect on the suspicious versus follow task. CONCLUSIONS: Classifiers should either mitigate artefacts or detect them. Images should be excluded from diagnosis/recommendation when artefacts are present in amounts outside the machine perceived quality range. Failure to do so will reduce accuracy and impede approval from regulatory agencies. John Wiley and Sons Inc. 2021-03-19 /pmc/articles/PMC9060017/ /pubmed/35664971 http://dx.doi.org/10.1002/ski2.19 Text en © 2021 The Authors. Skin Health and Disease published by John Wiley & Sons Ltd on behalf of British Association of Dermatologists. https://creativecommons.org/licenses/by/4.0/This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Articles
Petrie, T.C.
Larson, C.
Heath, M.
Samatham, R.
Davis, A.
Berry, E.G.
Leachman, S.A.
Quantifying acceptable artefact ranges for dermatologic classification algorithms
title Quantifying acceptable artefact ranges for dermatologic classification algorithms
title_full Quantifying acceptable artefact ranges for dermatologic classification algorithms
title_fullStr Quantifying acceptable artefact ranges for dermatologic classification algorithms
title_full_unstemmed Quantifying acceptable artefact ranges for dermatologic classification algorithms
title_short Quantifying acceptable artefact ranges for dermatologic classification algorithms
title_sort quantifying acceptable artefact ranges for dermatologic classification algorithms
topic Original Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9060017/
https://www.ncbi.nlm.nih.gov/pubmed/35664971
http://dx.doi.org/10.1002/ski2.19
work_keys_str_mv AT petrietc quantifyingacceptableartefactrangesfordermatologicclassificationalgorithms
AT larsonc quantifyingacceptableartefactrangesfordermatologicclassificationalgorithms
AT heathm quantifyingacceptableartefactrangesfordermatologicclassificationalgorithms
AT samathamr quantifyingacceptableartefactrangesfordermatologicclassificationalgorithms
AT davisa quantifyingacceptableartefactrangesfordermatologicclassificationalgorithms
AT berryeg quantifyingacceptableartefactrangesfordermatologicclassificationalgorithms
AT leachmansa quantifyingacceptableartefactrangesfordermatologicclassificationalgorithms