Cargando…

Mapping global dynamics of benchmark creation and saturation in artificial intelligence

Benchmarks are crucial to measuring and steering progress in artificial intelligence (AI). However, recent studies raised concerns over the state of AI benchmarking, reporting issues such as benchmark overfitting, benchmark saturation and increasing centralization of benchmark dataset creation. To f...

Descripción completa

Detalles Bibliográficos
Autores principales: Ott, Simon, Barbosa-Silva, Adriano, Blagec, Kathrin, Brauner, Jan, Samwald, Matthias
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9649641/
https://www.ncbi.nlm.nih.gov/pubmed/36357391
http://dx.doi.org/10.1038/s41467-022-34591-0
_version_ 1784827841064468480
author Ott, Simon
Barbosa-Silva, Adriano
Blagec, Kathrin
Brauner, Jan
Samwald, Matthias
author_facet Ott, Simon
Barbosa-Silva, Adriano
Blagec, Kathrin
Brauner, Jan
Samwald, Matthias
author_sort Ott, Simon
collection PubMed
description Benchmarks are crucial to measuring and steering progress in artificial intelligence (AI). However, recent studies raised concerns over the state of AI benchmarking, reporting issues such as benchmark overfitting, benchmark saturation and increasing centralization of benchmark dataset creation. To facilitate monitoring of the health of the AI benchmarking ecosystem, we introduce methodologies for creating condensed maps of the global dynamics of benchmark creation and saturation. We curate data for 3765 benchmarks covering the entire domains of computer vision and natural language processing, and show that a large fraction of benchmarks quickly trends towards near-saturation, that many benchmarks fail to find widespread utilization, and that benchmark performance gains for different AI tasks are prone to unforeseen bursts. We analyze attributes associated with benchmark popularity, and conclude that future benchmarks should emphasize versatility, breadth and real-world utility.
format Online
Article
Text
id pubmed-9649641
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-96496412022-11-15 Mapping global dynamics of benchmark creation and saturation in artificial intelligence Ott, Simon Barbosa-Silva, Adriano Blagec, Kathrin Brauner, Jan Samwald, Matthias Nat Commun Article Benchmarks are crucial to measuring and steering progress in artificial intelligence (AI). However, recent studies raised concerns over the state of AI benchmarking, reporting issues such as benchmark overfitting, benchmark saturation and increasing centralization of benchmark dataset creation. To facilitate monitoring of the health of the AI benchmarking ecosystem, we introduce methodologies for creating condensed maps of the global dynamics of benchmark creation and saturation. We curate data for 3765 benchmarks covering the entire domains of computer vision and natural language processing, and show that a large fraction of benchmarks quickly trends towards near-saturation, that many benchmarks fail to find widespread utilization, and that benchmark performance gains for different AI tasks are prone to unforeseen bursts. We analyze attributes associated with benchmark popularity, and conclude that future benchmarks should emphasize versatility, breadth and real-world utility. Nature Publishing Group UK 2022-11-10 /pmc/articles/PMC9649641/ /pubmed/36357391 http://dx.doi.org/10.1038/s41467-022-34591-0 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Ott, Simon
Barbosa-Silva, Adriano
Blagec, Kathrin
Brauner, Jan
Samwald, Matthias
Mapping global dynamics of benchmark creation and saturation in artificial intelligence
title Mapping global dynamics of benchmark creation and saturation in artificial intelligence
title_full Mapping global dynamics of benchmark creation and saturation in artificial intelligence
title_fullStr Mapping global dynamics of benchmark creation and saturation in artificial intelligence
title_full_unstemmed Mapping global dynamics of benchmark creation and saturation in artificial intelligence
title_short Mapping global dynamics of benchmark creation and saturation in artificial intelligence
title_sort mapping global dynamics of benchmark creation and saturation in artificial intelligence
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9649641/
https://www.ncbi.nlm.nih.gov/pubmed/36357391
http://dx.doi.org/10.1038/s41467-022-34591-0
work_keys_str_mv AT ottsimon mappingglobaldynamicsofbenchmarkcreationandsaturationinartificialintelligence
AT barbosasilvaadriano mappingglobaldynamicsofbenchmarkcreationandsaturationinartificialintelligence
AT blageckathrin mappingglobaldynamicsofbenchmarkcreationandsaturationinartificialintelligence
AT braunerjan mappingglobaldynamicsofbenchmarkcreationandsaturationinartificialintelligence
AT samwaldmatthias mappingglobaldynamicsofbenchmarkcreationandsaturationinartificialintelligence