Cargando…
Mapping global dynamics of benchmark creation and saturation in artificial intelligence
Benchmarks are crucial to measuring and steering progress in artificial intelligence (AI). However, recent studies raised concerns over the state of AI benchmarking, reporting issues such as benchmark overfitting, benchmark saturation and increasing centralization of benchmark dataset creation. To f...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9649641/ https://www.ncbi.nlm.nih.gov/pubmed/36357391 http://dx.doi.org/10.1038/s41467-022-34591-0 |
_version_ | 1784827841064468480 |
---|---|
author | Ott, Simon Barbosa-Silva, Adriano Blagec, Kathrin Brauner, Jan Samwald, Matthias |
author_facet | Ott, Simon Barbosa-Silva, Adriano Blagec, Kathrin Brauner, Jan Samwald, Matthias |
author_sort | Ott, Simon |
collection | PubMed |
description | Benchmarks are crucial to measuring and steering progress in artificial intelligence (AI). However, recent studies raised concerns over the state of AI benchmarking, reporting issues such as benchmark overfitting, benchmark saturation and increasing centralization of benchmark dataset creation. To facilitate monitoring of the health of the AI benchmarking ecosystem, we introduce methodologies for creating condensed maps of the global dynamics of benchmark creation and saturation. We curate data for 3765 benchmarks covering the entire domains of computer vision and natural language processing, and show that a large fraction of benchmarks quickly trends towards near-saturation, that many benchmarks fail to find widespread utilization, and that benchmark performance gains for different AI tasks are prone to unforeseen bursts. We analyze attributes associated with benchmark popularity, and conclude that future benchmarks should emphasize versatility, breadth and real-world utility. |
format | Online Article Text |
id | pubmed-9649641 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-96496412022-11-15 Mapping global dynamics of benchmark creation and saturation in artificial intelligence Ott, Simon Barbosa-Silva, Adriano Blagec, Kathrin Brauner, Jan Samwald, Matthias Nat Commun Article Benchmarks are crucial to measuring and steering progress in artificial intelligence (AI). However, recent studies raised concerns over the state of AI benchmarking, reporting issues such as benchmark overfitting, benchmark saturation and increasing centralization of benchmark dataset creation. To facilitate monitoring of the health of the AI benchmarking ecosystem, we introduce methodologies for creating condensed maps of the global dynamics of benchmark creation and saturation. We curate data for 3765 benchmarks covering the entire domains of computer vision and natural language processing, and show that a large fraction of benchmarks quickly trends towards near-saturation, that many benchmarks fail to find widespread utilization, and that benchmark performance gains for different AI tasks are prone to unforeseen bursts. We analyze attributes associated with benchmark popularity, and conclude that future benchmarks should emphasize versatility, breadth and real-world utility. Nature Publishing Group UK 2022-11-10 /pmc/articles/PMC9649641/ /pubmed/36357391 http://dx.doi.org/10.1038/s41467-022-34591-0 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Ott, Simon Barbosa-Silva, Adriano Blagec, Kathrin Brauner, Jan Samwald, Matthias Mapping global dynamics of benchmark creation and saturation in artificial intelligence |
title | Mapping global dynamics of benchmark creation and saturation in artificial intelligence |
title_full | Mapping global dynamics of benchmark creation and saturation in artificial intelligence |
title_fullStr | Mapping global dynamics of benchmark creation and saturation in artificial intelligence |
title_full_unstemmed | Mapping global dynamics of benchmark creation and saturation in artificial intelligence |
title_short | Mapping global dynamics of benchmark creation and saturation in artificial intelligence |
title_sort | mapping global dynamics of benchmark creation and saturation in artificial intelligence |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9649641/ https://www.ncbi.nlm.nih.gov/pubmed/36357391 http://dx.doi.org/10.1038/s41467-022-34591-0 |
work_keys_str_mv | AT ottsimon mappingglobaldynamicsofbenchmarkcreationandsaturationinartificialintelligence AT barbosasilvaadriano mappingglobaldynamicsofbenchmarkcreationandsaturationinartificialintelligence AT blageckathrin mappingglobaldynamicsofbenchmarkcreationandsaturationinartificialintelligence AT braunerjan mappingglobaldynamicsofbenchmarkcreationandsaturationinartificialintelligence AT samwaldmatthias mappingglobaldynamicsofbenchmarkcreationandsaturationinartificialintelligence |