Cargando…

Do We Train on Test Data? Purging CIFAR of Near-Duplicates

The CIFAR-10 and CIFAR-100 datasets are two of the most heavily benchmarked datasets in computer vision and are often used to evaluate novel methods and model architectures in the field of deep learning. However, we find that 3.3% and 10% of the images from the test sets of these datasets have dupli...

Descripción completa

Detalles Bibliográficos
Autores principales:	Barz, Björn, Denzler, Joachim
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	MDPI 2020
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8321059/ https://www.ncbi.nlm.nih.gov/pubmed/34460587 http://dx.doi.org/10.3390/jimaging6060041

_version_	1783730761628647424
author	Barz, Björn Denzler, Joachim
author_facet	Barz, Björn Denzler, Joachim
author_sort	Barz, Björn
collection	PubMed
description	The CIFAR-10 and CIFAR-100 datasets are two of the most heavily benchmarked datasets in computer vision and are often used to evaluate novel methods and model architectures in the field of deep learning. However, we find that 3.3% and 10% of the images from the test sets of these datasets have duplicates in the training set. These duplicates are easily recognizable by memorization and may, hence, bias the comparison of image recognition techniques regarding their generalization capability. To eliminate this bias, we provide the “fair CIFAR” (ciFAIR) dataset, where we replaced all duplicates in the test sets with new images sampled from the same domain. The training set remains unchanged, in order not to invalidate pre-trained models. We then re-evaluate the classification performance of various popular state-of-the-art CNN architectures on these new test sets to investigate whether recent research has overfitted to memorizing data instead of learning abstract concepts. We find a significant drop in classification accuracy of between 9% and 14% relative to the original performance on the duplicate-free test set. We make both the ciFAIR dataset and pre-trained models publicly available and furthermore maintain a leaderboard for tracking the state of the art.
format	Online Article Text
id	pubmed-8321059
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	MDPI
record_format	MEDLINE/PubMed
spelling	pubmed-83210592021-08-26 Do We Train on Test Data? Purging CIFAR of Near-Duplicates Barz, Björn Denzler, Joachim J Imaging Article The CIFAR-10 and CIFAR-100 datasets are two of the most heavily benchmarked datasets in computer vision and are often used to evaluate novel methods and model architectures in the field of deep learning. However, we find that 3.3% and 10% of the images from the test sets of these datasets have duplicates in the training set. These duplicates are easily recognizable by memorization and may, hence, bias the comparison of image recognition techniques regarding their generalization capability. To eliminate this bias, we provide the “fair CIFAR” (ciFAIR) dataset, where we replaced all duplicates in the test sets with new images sampled from the same domain. The training set remains unchanged, in order not to invalidate pre-trained models. We then re-evaluate the classification performance of various popular state-of-the-art CNN architectures on these new test sets to investigate whether recent research has overfitted to memorizing data instead of learning abstract concepts. We find a significant drop in classification accuracy of between 9% and 14% relative to the original performance on the duplicate-free test set. We make both the ciFAIR dataset and pre-trained models publicly available and furthermore maintain a leaderboard for tracking the state of the art. MDPI 2020-06-02 /pmc/articles/PMC8321059/ /pubmed/34460587 http://dx.doi.org/10.3390/jimaging6060041 Text en © 2020 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ).
spellingShingle	Article Barz, Björn Denzler, Joachim Do We Train on Test Data? Purging CIFAR of Near-Duplicates
title	Do We Train on Test Data? Purging CIFAR of Near-Duplicates
title_full	Do We Train on Test Data? Purging CIFAR of Near-Duplicates
title_fullStr	Do We Train on Test Data? Purging CIFAR of Near-Duplicates
title_full_unstemmed	Do We Train on Test Data? Purging CIFAR of Near-Duplicates
title_short	Do We Train on Test Data? Purging CIFAR of Near-Duplicates
title_sort	do we train on test data? purging cifar of near-duplicates
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8321059/ https://www.ncbi.nlm.nih.gov/pubmed/34460587 http://dx.doi.org/10.3390/jimaging6060041
work_keys_str_mv	AT barzbjorn dowetrainontestdatapurgingcifarofnearduplicates AT denzlerjoachim dowetrainontestdatapurgingcifarofnearduplicates

Do We Train on Test Data? Purging CIFAR of Near-Duplicates

Ejemplares similares