Cargando…

The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions

Training of neural networks for automated diagnosis of pigmented skin lesions is hampered by the small size and lack of diversity of available datasets of dermatoscopic images. We tackle this problem by releasing the HAM10000 (“Human Against Machine with 10000 training images”) dataset. We collected...

Descripción completa

Detalles Bibliográficos
Autores principales: Tschandl, Philipp, Rosendahl, Cliff, Kittler, Harald
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6091241/
https://www.ncbi.nlm.nih.gov/pubmed/30106392
http://dx.doi.org/10.1038/sdata.2018.161
_version_ 1783347357015867392
author Tschandl, Philipp
Rosendahl, Cliff
Kittler, Harald
author_facet Tschandl, Philipp
Rosendahl, Cliff
Kittler, Harald
author_sort Tschandl, Philipp
collection PubMed
description Training of neural networks for automated diagnosis of pigmented skin lesions is hampered by the small size and lack of diversity of available datasets of dermatoscopic images. We tackle this problem by releasing the HAM10000 (“Human Against Machine with 10000 training images”) dataset. We collected dermatoscopic images from different populations acquired and stored by different modalities. Given this diversity we had to apply different acquisition and cleaning methods and developed semi-automatic workflows utilizing specifically trained neural networks. The final dataset consists of 10015 dermatoscopic images which are released as a training set for academic machine learning purposes and are publicly available through the ISIC archive. This benchmark dataset can be used for machine learning and for comparisons with human experts. Cases include a representative collection of all important diagnostic categories in the realm of pigmented lesions. More than 50% of lesions have been confirmed by pathology, while the ground truth for the rest of the cases was either follow-up, expert consensus, or confirmation by in-vivo confocal microscopy.
format Online
Article
Text
id pubmed-6091241
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Nature Publishing Group
record_format MEDLINE/PubMed
spelling pubmed-60912412018-08-24 The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions Tschandl, Philipp Rosendahl, Cliff Kittler, Harald Sci Data Data Descriptor Training of neural networks for automated diagnosis of pigmented skin lesions is hampered by the small size and lack of diversity of available datasets of dermatoscopic images. We tackle this problem by releasing the HAM10000 (“Human Against Machine with 10000 training images”) dataset. We collected dermatoscopic images from different populations acquired and stored by different modalities. Given this diversity we had to apply different acquisition and cleaning methods and developed semi-automatic workflows utilizing specifically trained neural networks. The final dataset consists of 10015 dermatoscopic images which are released as a training set for academic machine learning purposes and are publicly available through the ISIC archive. This benchmark dataset can be used for machine learning and for comparisons with human experts. Cases include a representative collection of all important diagnostic categories in the realm of pigmented lesions. More than 50% of lesions have been confirmed by pathology, while the ground truth for the rest of the cases was either follow-up, expert consensus, or confirmation by in-vivo confocal microscopy. Nature Publishing Group 2018-08-14 /pmc/articles/PMC6091241/ /pubmed/30106392 http://dx.doi.org/10.1038/sdata.2018.161 Text en Copyright © 2018, The Author(s) http://creativecommons.org/licenses/by/4.0/ Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ The Creative Commons Public Domain Dedication waiver http://creativecommons.org/publicdomain/zero/1.0/ applies to the metadata files made available in this article.
spellingShingle Data Descriptor
Tschandl, Philipp
Rosendahl, Cliff
Kittler, Harald
The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions
title The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions
title_full The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions
title_fullStr The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions
title_full_unstemmed The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions
title_short The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions
title_sort ham10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions
topic Data Descriptor
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6091241/
https://www.ncbi.nlm.nih.gov/pubmed/30106392
http://dx.doi.org/10.1038/sdata.2018.161
work_keys_str_mv AT tschandlphilipp theham10000datasetalargecollectionofmultisourcedermatoscopicimagesofcommonpigmentedskinlesions
AT rosendahlcliff theham10000datasetalargecollectionofmultisourcedermatoscopicimagesofcommonpigmentedskinlesions
AT kittlerharald theham10000datasetalargecollectionofmultisourcedermatoscopicimagesofcommonpigmentedskinlesions
AT tschandlphilipp ham10000datasetalargecollectionofmultisourcedermatoscopicimagesofcommonpigmentedskinlesions
AT rosendahlcliff ham10000datasetalargecollectionofmultisourcedermatoscopicimagesofcommonpigmentedskinlesions
AT kittlerharald ham10000datasetalargecollectionofmultisourcedermatoscopicimagesofcommonpigmentedskinlesions