Cargando…

Mapping global dynamics of benchmark creation and saturation in artificial intelligence

Benchmarks are crucial to measuring and steering progress in artificial intelligence (AI). However, recent studies raised concerns over the state of AI benchmarking, reporting issues such as benchmark overfitting, benchmark saturation and increasing centralization of benchmark dataset creation. To f...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ott, Simon, Barbosa-Silva, Adriano, Blagec, Kathrin, Brauner, Jan, Samwald, Matthias
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Nature Publishing Group UK 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9649641/ https://www.ncbi.nlm.nih.gov/pubmed/36357391 http://dx.doi.org/10.1038/s41467-022-34591-0

_version_	1784827841064468480
author	Ott, Simon Barbosa-Silva, Adriano Blagec, Kathrin Brauner, Jan Samwald, Matthias
author_facet	Ott, Simon Barbosa-Silva, Adriano Blagec, Kathrin Brauner, Jan Samwald, Matthias
author_sort	Ott, Simon
collection	PubMed
description	Benchmarks are crucial to measuring and steering progress in artificial intelligence (AI). However, recent studies raised concerns over the state of AI benchmarking, reporting issues such as benchmark overfitting, benchmark saturation and increasing centralization of benchmark dataset creation. To facilitate monitoring of the health of the AI benchmarking ecosystem, we introduce methodologies for creating condensed maps of the global dynamics of benchmark creation and saturation. We curate data for 3765 benchmarks covering the entire domains of computer vision and natural language processing, and show that a large fraction of benchmarks quickly trends towards near-saturation, that many benchmarks fail to find widespread utilization, and that benchmark performance gains for different AI tasks are prone to unforeseen bursts. We analyze attributes associated with benchmark popularity, and conclude that future benchmarks should emphasize versatility, breadth and real-world utility.
format	Online Article Text
id	pubmed-9649641
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Nature Publishing Group UK
record_format	MEDLINE/PubMed
spelling	pubmed-96496412022-11-15 Mapping global dynamics of benchmark creation and saturation in artificial intelligence Ott, Simon Barbosa-Silva, Adriano Blagec, Kathrin Brauner, Jan Samwald, Matthias Nat Commun Article Benchmarks are crucial to measuring and steering progress in artificial intelligence (AI). However, recent studies raised concerns over the state of AI benchmarking, reporting issues such as benchmark overfitting, benchmark saturation and increasing centralization of benchmark dataset creation. To facilitate monitoring of the health of the AI benchmarking ecosystem, we introduce methodologies for creating condensed maps of the global dynamics of benchmark creation and saturation. We curate data for 3765 benchmarks covering the entire domains of computer vision and natural language processing, and show that a large fraction of benchmarks quickly trends towards near-saturation, that many benchmarks fail to find widespread utilization, and that benchmark performance gains for different AI tasks are prone to unforeseen bursts. We analyze attributes associated with benchmark popularity, and conclude that future benchmarks should emphasize versatility, breadth and real-world utility. Nature Publishing Group UK 2022-11-10 /pmc/articles/PMC9649641/ /pubmed/36357391 http://dx.doi.org/10.1038/s41467-022-34591-0 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle	Article Ott, Simon Barbosa-Silva, Adriano Blagec, Kathrin Brauner, Jan Samwald, Matthias Mapping global dynamics of benchmark creation and saturation in artificial intelligence
title	Mapping global dynamics of benchmark creation and saturation in artificial intelligence
title_full	Mapping global dynamics of benchmark creation and saturation in artificial intelligence
title_fullStr	Mapping global dynamics of benchmark creation and saturation in artificial intelligence
title_full_unstemmed	Mapping global dynamics of benchmark creation and saturation in artificial intelligence
title_short	Mapping global dynamics of benchmark creation and saturation in artificial intelligence
title_sort	mapping global dynamics of benchmark creation and saturation in artificial intelligence
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9649641/ https://www.ncbi.nlm.nih.gov/pubmed/36357391 http://dx.doi.org/10.1038/s41467-022-34591-0
work_keys_str_mv	AT ottsimon mappingglobaldynamicsofbenchmarkcreationandsaturationinartificialintelligence AT barbosasilvaadriano mappingglobaldynamicsofbenchmarkcreationandsaturationinartificialintelligence AT blageckathrin mappingglobaldynamicsofbenchmarkcreationandsaturationinartificialintelligence AT braunerjan mappingglobaldynamicsofbenchmarkcreationandsaturationinartificialintelligence AT samwaldmatthias mappingglobaldynamicsofbenchmarkcreationandsaturationinartificialintelligence

Mapping global dynamics of benchmark creation and saturation in artificial intelligence

Ejemplares similares