Cargando…
Effects of Label Noise on Deep Learning-Based Skin Cancer Classification
Recent studies have shown that deep learning is capable of classifying dermatoscopic images at least as well as dermatologists. However, many studies in skin cancer classification utilize non-biopsy-verified training images. This imperfect ground truth introduces a systematic error, but the effects...
Autores principales: | , , , , , , , , , , , , , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7218064/ https://www.ncbi.nlm.nih.gov/pubmed/32435646 http://dx.doi.org/10.3389/fmed.2020.00177 |
_version_ | 1783532718584233984 |
---|---|
author | Hekler, Achim Kather, Jakob N. Krieghoff-Henning, Eva Utikal, Jochen S. Meier, Friedegund Gellrich, Frank F. Upmeier zu Belzen, Julius French, Lars Schlager, Justin G. Ghoreschi, Kamran Wilhelm, Tabea Kutzner, Heinz Berking, Carola Heppt, Markus V. Haferkamp, Sebastian Sondermann, Wiebke Schadendorf, Dirk Schilling, Bastian Izar, Benjamin Maron, Roman Schmitt, Max Fröhling, Stefan Lipka, Daniel B. Brinker, Titus J. |
author_facet | Hekler, Achim Kather, Jakob N. Krieghoff-Henning, Eva Utikal, Jochen S. Meier, Friedegund Gellrich, Frank F. Upmeier zu Belzen, Julius French, Lars Schlager, Justin G. Ghoreschi, Kamran Wilhelm, Tabea Kutzner, Heinz Berking, Carola Heppt, Markus V. Haferkamp, Sebastian Sondermann, Wiebke Schadendorf, Dirk Schilling, Bastian Izar, Benjamin Maron, Roman Schmitt, Max Fröhling, Stefan Lipka, Daniel B. Brinker, Titus J. |
author_sort | Hekler, Achim |
collection | PubMed |
description | Recent studies have shown that deep learning is capable of classifying dermatoscopic images at least as well as dermatologists. However, many studies in skin cancer classification utilize non-biopsy-verified training images. This imperfect ground truth introduces a systematic error, but the effects on classifier performance are currently unknown. Here, we systematically examine the effects of label noise by training and evaluating convolutional neural networks (CNN) with 804 images of melanoma and nevi labeled either by dermatologists or by biopsy. The CNNs are evaluated on a test set of 384 images by means of 4-fold cross validation comparing the outputs with either the corresponding dermatological or the biopsy-verified diagnosis. With identical ground truths of training and test labels, high accuracies with 75.03% (95% CI: 74.39–75.66%) for dermatological and 73.80% (95% CI: 73.10–74.51%) for biopsy-verified labels can be achieved. However, if the CNN is trained and tested with different ground truths, accuracy drops significantly to 64.53% (95% CI: 63.12–65.94%, p < 0.01) on a non-biopsy-verified and to 64.24% (95% CI: 62.66–65.83%, p < 0.01) on a biopsy-verified test set. In conclusion, deep learning methods for skin cancer classification are highly sensitive to label noise and future work should use biopsy-verified training images to mitigate this problem. |
format | Online Article Text |
id | pubmed-7218064 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-72180642020-05-20 Effects of Label Noise on Deep Learning-Based Skin Cancer Classification Hekler, Achim Kather, Jakob N. Krieghoff-Henning, Eva Utikal, Jochen S. Meier, Friedegund Gellrich, Frank F. Upmeier zu Belzen, Julius French, Lars Schlager, Justin G. Ghoreschi, Kamran Wilhelm, Tabea Kutzner, Heinz Berking, Carola Heppt, Markus V. Haferkamp, Sebastian Sondermann, Wiebke Schadendorf, Dirk Schilling, Bastian Izar, Benjamin Maron, Roman Schmitt, Max Fröhling, Stefan Lipka, Daniel B. Brinker, Titus J. Front Med (Lausanne) Medicine Recent studies have shown that deep learning is capable of classifying dermatoscopic images at least as well as dermatologists. However, many studies in skin cancer classification utilize non-biopsy-verified training images. This imperfect ground truth introduces a systematic error, but the effects on classifier performance are currently unknown. Here, we systematically examine the effects of label noise by training and evaluating convolutional neural networks (CNN) with 804 images of melanoma and nevi labeled either by dermatologists or by biopsy. The CNNs are evaluated on a test set of 384 images by means of 4-fold cross validation comparing the outputs with either the corresponding dermatological or the biopsy-verified diagnosis. With identical ground truths of training and test labels, high accuracies with 75.03% (95% CI: 74.39–75.66%) for dermatological and 73.80% (95% CI: 73.10–74.51%) for biopsy-verified labels can be achieved. However, if the CNN is trained and tested with different ground truths, accuracy drops significantly to 64.53% (95% CI: 63.12–65.94%, p < 0.01) on a non-biopsy-verified and to 64.24% (95% CI: 62.66–65.83%, p < 0.01) on a biopsy-verified test set. In conclusion, deep learning methods for skin cancer classification are highly sensitive to label noise and future work should use biopsy-verified training images to mitigate this problem. Frontiers Media S.A. 2020-05-06 /pmc/articles/PMC7218064/ /pubmed/32435646 http://dx.doi.org/10.3389/fmed.2020.00177 Text en Copyright © 2020 Hekler, Kather, Krieghoff-Henning, Utikal, Meier, Gellrich, Upmeier zu Belzen, French, Schlager, Ghoreschi, Wilhelm, Kutzner, Berking, Heppt, Haferkamp, Sondermann, Schadendorf, Schilling, Izar, Maron, Schmitt, Fröhling, Lipka and Brinker. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Medicine Hekler, Achim Kather, Jakob N. Krieghoff-Henning, Eva Utikal, Jochen S. Meier, Friedegund Gellrich, Frank F. Upmeier zu Belzen, Julius French, Lars Schlager, Justin G. Ghoreschi, Kamran Wilhelm, Tabea Kutzner, Heinz Berking, Carola Heppt, Markus V. Haferkamp, Sebastian Sondermann, Wiebke Schadendorf, Dirk Schilling, Bastian Izar, Benjamin Maron, Roman Schmitt, Max Fröhling, Stefan Lipka, Daniel B. Brinker, Titus J. Effects of Label Noise on Deep Learning-Based Skin Cancer Classification |
title | Effects of Label Noise on Deep Learning-Based Skin Cancer Classification |
title_full | Effects of Label Noise on Deep Learning-Based Skin Cancer Classification |
title_fullStr | Effects of Label Noise on Deep Learning-Based Skin Cancer Classification |
title_full_unstemmed | Effects of Label Noise on Deep Learning-Based Skin Cancer Classification |
title_short | Effects of Label Noise on Deep Learning-Based Skin Cancer Classification |
title_sort | effects of label noise on deep learning-based skin cancer classification |
topic | Medicine |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7218064/ https://www.ncbi.nlm.nih.gov/pubmed/32435646 http://dx.doi.org/10.3389/fmed.2020.00177 |
work_keys_str_mv | AT heklerachim effectsoflabelnoiseondeeplearningbasedskincancerclassification AT katherjakobn effectsoflabelnoiseondeeplearningbasedskincancerclassification AT krieghoffhenningeva effectsoflabelnoiseondeeplearningbasedskincancerclassification AT utikaljochens effectsoflabelnoiseondeeplearningbasedskincancerclassification AT meierfriedegund effectsoflabelnoiseondeeplearningbasedskincancerclassification AT gellrichfrankf effectsoflabelnoiseondeeplearningbasedskincancerclassification AT upmeierzubelzenjulius effectsoflabelnoiseondeeplearningbasedskincancerclassification AT frenchlars effectsoflabelnoiseondeeplearningbasedskincancerclassification AT schlagerjusting effectsoflabelnoiseondeeplearningbasedskincancerclassification AT ghoreschikamran effectsoflabelnoiseondeeplearningbasedskincancerclassification AT wilhelmtabea effectsoflabelnoiseondeeplearningbasedskincancerclassification AT kutznerheinz effectsoflabelnoiseondeeplearningbasedskincancerclassification AT berkingcarola effectsoflabelnoiseondeeplearningbasedskincancerclassification AT hepptmarkusv effectsoflabelnoiseondeeplearningbasedskincancerclassification AT haferkampsebastian effectsoflabelnoiseondeeplearningbasedskincancerclassification AT sondermannwiebke effectsoflabelnoiseondeeplearningbasedskincancerclassification AT schadendorfdirk effectsoflabelnoiseondeeplearningbasedskincancerclassification AT schillingbastian effectsoflabelnoiseondeeplearningbasedskincancerclassification AT izarbenjamin effectsoflabelnoiseondeeplearningbasedskincancerclassification AT maronroman effectsoflabelnoiseondeeplearningbasedskincancerclassification AT schmittmax effectsoflabelnoiseondeeplearningbasedskincancerclassification AT frohlingstefan effectsoflabelnoiseondeeplearningbasedskincancerclassification AT lipkadanielb effectsoflabelnoiseondeeplearningbasedskincancerclassification AT brinkertitusj effectsoflabelnoiseondeeplearningbasedskincancerclassification |