Cargando…
A framework for rigorous evaluation of human performance in human and machine learning comparison studies
Rigorous comparisons of human and machine learning algorithm performance on the same task help to support accurate claims about algorithm success rates and advances understanding of their performance relative to that of human performers. In turn, these comparisons are critical for supporting advance...
Autores principales: | , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8971503/ https://www.ncbi.nlm.nih.gov/pubmed/35361786 http://dx.doi.org/10.1038/s41598-022-08078-3 |
_version_ | 1784679647562170368 |
---|---|
author | Cowley, Hannah P. Natter, Mandy Gray-Roncal, Karla Rhodes, Rebecca E. Johnson, Erik C. Drenkow, Nathan Shead, Timothy M. Chance, Frances S. Wester, Brock Gray-Roncal, William |
author_facet | Cowley, Hannah P. Natter, Mandy Gray-Roncal, Karla Rhodes, Rebecca E. Johnson, Erik C. Drenkow, Nathan Shead, Timothy M. Chance, Frances S. Wester, Brock Gray-Roncal, William |
author_sort | Cowley, Hannah P. |
collection | PubMed |
description | Rigorous comparisons of human and machine learning algorithm performance on the same task help to support accurate claims about algorithm success rates and advances understanding of their performance relative to that of human performers. In turn, these comparisons are critical for supporting advances in artificial intelligence. However, the machine learning community has lacked a standardized, consensus framework for performing the evaluations of human performance necessary for comparison. We demonstrate common pitfalls in a designing the human performance evaluation and propose a framework for the evaluation of human performance, illustrating guiding principles for a successful comparison. These principles are first, to design the human evaluation with an understanding of the differences between human and algorithm cognition; second, to match trials between human participants and the algorithm evaluation, and third, to employ best practices for psychology research studies, such as the collection and analysis of supplementary and subjective data and adhering to ethical review protocols. We demonstrate our framework’s utility for designing a study to evaluate human performance on a one-shot learning task. Adoption of this common framework may provide a standard approach to evaluate algorithm performance and aid in the reproducibility of comparisons between human and machine learning algorithm performance. |
format | Online Article Text |
id | pubmed-8971503 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-89715032022-04-05 A framework for rigorous evaluation of human performance in human and machine learning comparison studies Cowley, Hannah P. Natter, Mandy Gray-Roncal, Karla Rhodes, Rebecca E. Johnson, Erik C. Drenkow, Nathan Shead, Timothy M. Chance, Frances S. Wester, Brock Gray-Roncal, William Sci Rep Article Rigorous comparisons of human and machine learning algorithm performance on the same task help to support accurate claims about algorithm success rates and advances understanding of their performance relative to that of human performers. In turn, these comparisons are critical for supporting advances in artificial intelligence. However, the machine learning community has lacked a standardized, consensus framework for performing the evaluations of human performance necessary for comparison. We demonstrate common pitfalls in a designing the human performance evaluation and propose a framework for the evaluation of human performance, illustrating guiding principles for a successful comparison. These principles are first, to design the human evaluation with an understanding of the differences between human and algorithm cognition; second, to match trials between human participants and the algorithm evaluation, and third, to employ best practices for psychology research studies, such as the collection and analysis of supplementary and subjective data and adhering to ethical review protocols. We demonstrate our framework’s utility for designing a study to evaluate human performance on a one-shot learning task. Adoption of this common framework may provide a standard approach to evaluate algorithm performance and aid in the reproducibility of comparisons between human and machine learning algorithm performance. Nature Publishing Group UK 2022-03-31 /pmc/articles/PMC8971503/ /pubmed/35361786 http://dx.doi.org/10.1038/s41598-022-08078-3 Text en © This is a U.S. Government work and not under copyright protection in the US; foreign copyright protection may apply 2022, corrected publication 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Cowley, Hannah P. Natter, Mandy Gray-Roncal, Karla Rhodes, Rebecca E. Johnson, Erik C. Drenkow, Nathan Shead, Timothy M. Chance, Frances S. Wester, Brock Gray-Roncal, William A framework for rigorous evaluation of human performance in human and machine learning comparison studies |
title | A framework for rigorous evaluation of human performance in human and machine learning comparison studies |
title_full | A framework for rigorous evaluation of human performance in human and machine learning comparison studies |
title_fullStr | A framework for rigorous evaluation of human performance in human and machine learning comparison studies |
title_full_unstemmed | A framework for rigorous evaluation of human performance in human and machine learning comparison studies |
title_short | A framework for rigorous evaluation of human performance in human and machine learning comparison studies |
title_sort | framework for rigorous evaluation of human performance in human and machine learning comparison studies |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8971503/ https://www.ncbi.nlm.nih.gov/pubmed/35361786 http://dx.doi.org/10.1038/s41598-022-08078-3 |
work_keys_str_mv | AT cowleyhannahp aframeworkforrigorousevaluationofhumanperformanceinhumanandmachinelearningcomparisonstudies AT nattermandy aframeworkforrigorousevaluationofhumanperformanceinhumanandmachinelearningcomparisonstudies AT grayroncalkarla aframeworkforrigorousevaluationofhumanperformanceinhumanandmachinelearningcomparisonstudies AT rhodesrebeccae aframeworkforrigorousevaluationofhumanperformanceinhumanandmachinelearningcomparisonstudies AT johnsonerikc aframeworkforrigorousevaluationofhumanperformanceinhumanandmachinelearningcomparisonstudies AT drenkownathan aframeworkforrigorousevaluationofhumanperformanceinhumanandmachinelearningcomparisonstudies AT sheadtimothym aframeworkforrigorousevaluationofhumanperformanceinhumanandmachinelearningcomparisonstudies AT chancefrancess aframeworkforrigorousevaluationofhumanperformanceinhumanandmachinelearningcomparisonstudies AT westerbrock aframeworkforrigorousevaluationofhumanperformanceinhumanandmachinelearningcomparisonstudies AT grayroncalwilliam aframeworkforrigorousevaluationofhumanperformanceinhumanandmachinelearningcomparisonstudies AT cowleyhannahp frameworkforrigorousevaluationofhumanperformanceinhumanandmachinelearningcomparisonstudies AT nattermandy frameworkforrigorousevaluationofhumanperformanceinhumanandmachinelearningcomparisonstudies AT grayroncalkarla frameworkforrigorousevaluationofhumanperformanceinhumanandmachinelearningcomparisonstudies AT rhodesrebeccae frameworkforrigorousevaluationofhumanperformanceinhumanandmachinelearningcomparisonstudies AT johnsonerikc frameworkforrigorousevaluationofhumanperformanceinhumanandmachinelearningcomparisonstudies AT drenkownathan frameworkforrigorousevaluationofhumanperformanceinhumanandmachinelearningcomparisonstudies AT sheadtimothym frameworkforrigorousevaluationofhumanperformanceinhumanandmachinelearningcomparisonstudies AT chancefrancess frameworkforrigorousevaluationofhumanperformanceinhumanandmachinelearningcomparisonstudies AT westerbrock frameworkforrigorousevaluationofhumanperformanceinhumanandmachinelearningcomparisonstudies AT grayroncalwilliam frameworkforrigorousevaluationofhumanperformanceinhumanandmachinelearningcomparisonstudies |