Cargando…

Effects of Label Noise on Deep Learning-Based Skin Cancer Classification

Recent studies have shown that deep learning is capable of classifying dermatoscopic images at least as well as dermatologists. However, many studies in skin cancer classification utilize non-biopsy-verified training images. This imperfect ground truth introduces a systematic error, but the effects...

Descripción completa

Detalles Bibliográficos
Autores principales: Hekler, Achim, Kather, Jakob N., Krieghoff-Henning, Eva, Utikal, Jochen S., Meier, Friedegund, Gellrich, Frank F., Upmeier zu Belzen, Julius, French, Lars, Schlager, Justin G., Ghoreschi, Kamran, Wilhelm, Tabea, Kutzner, Heinz, Berking, Carola, Heppt, Markus V., Haferkamp, Sebastian, Sondermann, Wiebke, Schadendorf, Dirk, Schilling, Bastian, Izar, Benjamin, Maron, Roman, Schmitt, Max, Fröhling, Stefan, Lipka, Daniel B., Brinker, Titus J.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7218064/
https://www.ncbi.nlm.nih.gov/pubmed/32435646
http://dx.doi.org/10.3389/fmed.2020.00177
_version_ 1783532718584233984
author Hekler, Achim
Kather, Jakob N.
Krieghoff-Henning, Eva
Utikal, Jochen S.
Meier, Friedegund
Gellrich, Frank F.
Upmeier zu Belzen, Julius
French, Lars
Schlager, Justin G.
Ghoreschi, Kamran
Wilhelm, Tabea
Kutzner, Heinz
Berking, Carola
Heppt, Markus V.
Haferkamp, Sebastian
Sondermann, Wiebke
Schadendorf, Dirk
Schilling, Bastian
Izar, Benjamin
Maron, Roman
Schmitt, Max
Fröhling, Stefan
Lipka, Daniel B.
Brinker, Titus J.
author_facet Hekler, Achim
Kather, Jakob N.
Krieghoff-Henning, Eva
Utikal, Jochen S.
Meier, Friedegund
Gellrich, Frank F.
Upmeier zu Belzen, Julius
French, Lars
Schlager, Justin G.
Ghoreschi, Kamran
Wilhelm, Tabea
Kutzner, Heinz
Berking, Carola
Heppt, Markus V.
Haferkamp, Sebastian
Sondermann, Wiebke
Schadendorf, Dirk
Schilling, Bastian
Izar, Benjamin
Maron, Roman
Schmitt, Max
Fröhling, Stefan
Lipka, Daniel B.
Brinker, Titus J.
author_sort Hekler, Achim
collection PubMed
description Recent studies have shown that deep learning is capable of classifying dermatoscopic images at least as well as dermatologists. However, many studies in skin cancer classification utilize non-biopsy-verified training images. This imperfect ground truth introduces a systematic error, but the effects on classifier performance are currently unknown. Here, we systematically examine the effects of label noise by training and evaluating convolutional neural networks (CNN) with 804 images of melanoma and nevi labeled either by dermatologists or by biopsy. The CNNs are evaluated on a test set of 384 images by means of 4-fold cross validation comparing the outputs with either the corresponding dermatological or the biopsy-verified diagnosis. With identical ground truths of training and test labels, high accuracies with 75.03% (95% CI: 74.39–75.66%) for dermatological and 73.80% (95% CI: 73.10–74.51%) for biopsy-verified labels can be achieved. However, if the CNN is trained and tested with different ground truths, accuracy drops significantly to 64.53% (95% CI: 63.12–65.94%, p < 0.01) on a non-biopsy-verified and to 64.24% (95% CI: 62.66–65.83%, p < 0.01) on a biopsy-verified test set. In conclusion, deep learning methods for skin cancer classification are highly sensitive to label noise and future work should use biopsy-verified training images to mitigate this problem.
format Online
Article
Text
id pubmed-7218064
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-72180642020-05-20 Effects of Label Noise on Deep Learning-Based Skin Cancer Classification Hekler, Achim Kather, Jakob N. Krieghoff-Henning, Eva Utikal, Jochen S. Meier, Friedegund Gellrich, Frank F. Upmeier zu Belzen, Julius French, Lars Schlager, Justin G. Ghoreschi, Kamran Wilhelm, Tabea Kutzner, Heinz Berking, Carola Heppt, Markus V. Haferkamp, Sebastian Sondermann, Wiebke Schadendorf, Dirk Schilling, Bastian Izar, Benjamin Maron, Roman Schmitt, Max Fröhling, Stefan Lipka, Daniel B. Brinker, Titus J. Front Med (Lausanne) Medicine Recent studies have shown that deep learning is capable of classifying dermatoscopic images at least as well as dermatologists. However, many studies in skin cancer classification utilize non-biopsy-verified training images. This imperfect ground truth introduces a systematic error, but the effects on classifier performance are currently unknown. Here, we systematically examine the effects of label noise by training and evaluating convolutional neural networks (CNN) with 804 images of melanoma and nevi labeled either by dermatologists or by biopsy. The CNNs are evaluated on a test set of 384 images by means of 4-fold cross validation comparing the outputs with either the corresponding dermatological or the biopsy-verified diagnosis. With identical ground truths of training and test labels, high accuracies with 75.03% (95% CI: 74.39–75.66%) for dermatological and 73.80% (95% CI: 73.10–74.51%) for biopsy-verified labels can be achieved. However, if the CNN is trained and tested with different ground truths, accuracy drops significantly to 64.53% (95% CI: 63.12–65.94%, p < 0.01) on a non-biopsy-verified and to 64.24% (95% CI: 62.66–65.83%, p < 0.01) on a biopsy-verified test set. In conclusion, deep learning methods for skin cancer classification are highly sensitive to label noise and future work should use biopsy-verified training images to mitigate this problem. Frontiers Media S.A. 2020-05-06 /pmc/articles/PMC7218064/ /pubmed/32435646 http://dx.doi.org/10.3389/fmed.2020.00177 Text en Copyright © 2020 Hekler, Kather, Krieghoff-Henning, Utikal, Meier, Gellrich, Upmeier zu Belzen, French, Schlager, Ghoreschi, Wilhelm, Kutzner, Berking, Heppt, Haferkamp, Sondermann, Schadendorf, Schilling, Izar, Maron, Schmitt, Fröhling, Lipka and Brinker. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Medicine
Hekler, Achim
Kather, Jakob N.
Krieghoff-Henning, Eva
Utikal, Jochen S.
Meier, Friedegund
Gellrich, Frank F.
Upmeier zu Belzen, Julius
French, Lars
Schlager, Justin G.
Ghoreschi, Kamran
Wilhelm, Tabea
Kutzner, Heinz
Berking, Carola
Heppt, Markus V.
Haferkamp, Sebastian
Sondermann, Wiebke
Schadendorf, Dirk
Schilling, Bastian
Izar, Benjamin
Maron, Roman
Schmitt, Max
Fröhling, Stefan
Lipka, Daniel B.
Brinker, Titus J.
Effects of Label Noise on Deep Learning-Based Skin Cancer Classification
title Effects of Label Noise on Deep Learning-Based Skin Cancer Classification
title_full Effects of Label Noise on Deep Learning-Based Skin Cancer Classification
title_fullStr Effects of Label Noise on Deep Learning-Based Skin Cancer Classification
title_full_unstemmed Effects of Label Noise on Deep Learning-Based Skin Cancer Classification
title_short Effects of Label Noise on Deep Learning-Based Skin Cancer Classification
title_sort effects of label noise on deep learning-based skin cancer classification
topic Medicine
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7218064/
https://www.ncbi.nlm.nih.gov/pubmed/32435646
http://dx.doi.org/10.3389/fmed.2020.00177
work_keys_str_mv AT heklerachim effectsoflabelnoiseondeeplearningbasedskincancerclassification
AT katherjakobn effectsoflabelnoiseondeeplearningbasedskincancerclassification
AT krieghoffhenningeva effectsoflabelnoiseondeeplearningbasedskincancerclassification
AT utikaljochens effectsoflabelnoiseondeeplearningbasedskincancerclassification
AT meierfriedegund effectsoflabelnoiseondeeplearningbasedskincancerclassification
AT gellrichfrankf effectsoflabelnoiseondeeplearningbasedskincancerclassification
AT upmeierzubelzenjulius effectsoflabelnoiseondeeplearningbasedskincancerclassification
AT frenchlars effectsoflabelnoiseondeeplearningbasedskincancerclassification
AT schlagerjusting effectsoflabelnoiseondeeplearningbasedskincancerclassification
AT ghoreschikamran effectsoflabelnoiseondeeplearningbasedskincancerclassification
AT wilhelmtabea effectsoflabelnoiseondeeplearningbasedskincancerclassification
AT kutznerheinz effectsoflabelnoiseondeeplearningbasedskincancerclassification
AT berkingcarola effectsoflabelnoiseondeeplearningbasedskincancerclassification
AT hepptmarkusv effectsoflabelnoiseondeeplearningbasedskincancerclassification
AT haferkampsebastian effectsoflabelnoiseondeeplearningbasedskincancerclassification
AT sondermannwiebke effectsoflabelnoiseondeeplearningbasedskincancerclassification
AT schadendorfdirk effectsoflabelnoiseondeeplearningbasedskincancerclassification
AT schillingbastian effectsoflabelnoiseondeeplearningbasedskincancerclassification
AT izarbenjamin effectsoflabelnoiseondeeplearningbasedskincancerclassification
AT maronroman effectsoflabelnoiseondeeplearningbasedskincancerclassification
AT schmittmax effectsoflabelnoiseondeeplearningbasedskincancerclassification
AT frohlingstefan effectsoflabelnoiseondeeplearningbasedskincancerclassification
AT lipkadanielb effectsoflabelnoiseondeeplearningbasedskincancerclassification
AT brinkertitusj effectsoflabelnoiseondeeplearningbasedskincancerclassification