Cargando…

A framework for rigorous evaluation of human performance in human and machine learning comparison studies

Rigorous comparisons of human and machine learning algorithm performance on the same task help to support accurate claims about algorithm success rates and advances understanding of their performance relative to that of human performers. In turn, these comparisons are critical for supporting advance...

Descripción completa

Detalles Bibliográficos
Autores principales: Cowley, Hannah P., Natter, Mandy, Gray-Roncal, Karla, Rhodes, Rebecca E., Johnson, Erik C., Drenkow, Nathan, Shead, Timothy M., Chance, Frances S., Wester, Brock, Gray-Roncal, William
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8971503/
https://www.ncbi.nlm.nih.gov/pubmed/35361786
http://dx.doi.org/10.1038/s41598-022-08078-3
_version_ 1784679647562170368
author Cowley, Hannah P.
Natter, Mandy
Gray-Roncal, Karla
Rhodes, Rebecca E.
Johnson, Erik C.
Drenkow, Nathan
Shead, Timothy M.
Chance, Frances S.
Wester, Brock
Gray-Roncal, William
author_facet Cowley, Hannah P.
Natter, Mandy
Gray-Roncal, Karla
Rhodes, Rebecca E.
Johnson, Erik C.
Drenkow, Nathan
Shead, Timothy M.
Chance, Frances S.
Wester, Brock
Gray-Roncal, William
author_sort Cowley, Hannah P.
collection PubMed
description Rigorous comparisons of human and machine learning algorithm performance on the same task help to support accurate claims about algorithm success rates and advances understanding of their performance relative to that of human performers. In turn, these comparisons are critical for supporting advances in artificial intelligence. However, the machine learning community has lacked a standardized, consensus framework for performing the evaluations of human performance necessary for comparison. We demonstrate common pitfalls in a designing the human performance evaluation and propose a framework for the evaluation of human performance, illustrating guiding principles for a successful comparison. These principles are first, to design the human evaluation with an understanding of the differences between human and algorithm cognition; second, to match trials between human participants and the algorithm evaluation, and third, to employ best practices for psychology research studies, such as the collection and analysis of supplementary and subjective data and adhering to ethical review protocols. We demonstrate our framework’s utility for designing a study to evaluate human performance on a one-shot learning task. Adoption of this common framework may provide a standard approach to evaluate algorithm performance and aid in the reproducibility of comparisons between human and machine learning algorithm performance.
format Online
Article
Text
id pubmed-8971503
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-89715032022-04-05 A framework for rigorous evaluation of human performance in human and machine learning comparison studies Cowley, Hannah P. Natter, Mandy Gray-Roncal, Karla Rhodes, Rebecca E. Johnson, Erik C. Drenkow, Nathan Shead, Timothy M. Chance, Frances S. Wester, Brock Gray-Roncal, William Sci Rep Article Rigorous comparisons of human and machine learning algorithm performance on the same task help to support accurate claims about algorithm success rates and advances understanding of their performance relative to that of human performers. In turn, these comparisons are critical for supporting advances in artificial intelligence. However, the machine learning community has lacked a standardized, consensus framework for performing the evaluations of human performance necessary for comparison. We demonstrate common pitfalls in a designing the human performance evaluation and propose a framework for the evaluation of human performance, illustrating guiding principles for a successful comparison. These principles are first, to design the human evaluation with an understanding of the differences between human and algorithm cognition; second, to match trials between human participants and the algorithm evaluation, and third, to employ best practices for psychology research studies, such as the collection and analysis of supplementary and subjective data and adhering to ethical review protocols. We demonstrate our framework’s utility for designing a study to evaluate human performance on a one-shot learning task. Adoption of this common framework may provide a standard approach to evaluate algorithm performance and aid in the reproducibility of comparisons between human and machine learning algorithm performance. Nature Publishing Group UK 2022-03-31 /pmc/articles/PMC8971503/ /pubmed/35361786 http://dx.doi.org/10.1038/s41598-022-08078-3 Text en © This is a U.S. Government work and not under copyright protection in the US; foreign copyright protection may apply 2022, corrected publication 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Cowley, Hannah P.
Natter, Mandy
Gray-Roncal, Karla
Rhodes, Rebecca E.
Johnson, Erik C.
Drenkow, Nathan
Shead, Timothy M.
Chance, Frances S.
Wester, Brock
Gray-Roncal, William
A framework for rigorous evaluation of human performance in human and machine learning comparison studies
title A framework for rigorous evaluation of human performance in human and machine learning comparison studies
title_full A framework for rigorous evaluation of human performance in human and machine learning comparison studies
title_fullStr A framework for rigorous evaluation of human performance in human and machine learning comparison studies
title_full_unstemmed A framework for rigorous evaluation of human performance in human and machine learning comparison studies
title_short A framework for rigorous evaluation of human performance in human and machine learning comparison studies
title_sort framework for rigorous evaluation of human performance in human and machine learning comparison studies
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8971503/
https://www.ncbi.nlm.nih.gov/pubmed/35361786
http://dx.doi.org/10.1038/s41598-022-08078-3
work_keys_str_mv AT cowleyhannahp aframeworkforrigorousevaluationofhumanperformanceinhumanandmachinelearningcomparisonstudies
AT nattermandy aframeworkforrigorousevaluationofhumanperformanceinhumanandmachinelearningcomparisonstudies
AT grayroncalkarla aframeworkforrigorousevaluationofhumanperformanceinhumanandmachinelearningcomparisonstudies
AT rhodesrebeccae aframeworkforrigorousevaluationofhumanperformanceinhumanandmachinelearningcomparisonstudies
AT johnsonerikc aframeworkforrigorousevaluationofhumanperformanceinhumanandmachinelearningcomparisonstudies
AT drenkownathan aframeworkforrigorousevaluationofhumanperformanceinhumanandmachinelearningcomparisonstudies
AT sheadtimothym aframeworkforrigorousevaluationofhumanperformanceinhumanandmachinelearningcomparisonstudies
AT chancefrancess aframeworkforrigorousevaluationofhumanperformanceinhumanandmachinelearningcomparisonstudies
AT westerbrock aframeworkforrigorousevaluationofhumanperformanceinhumanandmachinelearningcomparisonstudies
AT grayroncalwilliam aframeworkforrigorousevaluationofhumanperformanceinhumanandmachinelearningcomparisonstudies
AT cowleyhannahp frameworkforrigorousevaluationofhumanperformanceinhumanandmachinelearningcomparisonstudies
AT nattermandy frameworkforrigorousevaluationofhumanperformanceinhumanandmachinelearningcomparisonstudies
AT grayroncalkarla frameworkforrigorousevaluationofhumanperformanceinhumanandmachinelearningcomparisonstudies
AT rhodesrebeccae frameworkforrigorousevaluationofhumanperformanceinhumanandmachinelearningcomparisonstudies
AT johnsonerikc frameworkforrigorousevaluationofhumanperformanceinhumanandmachinelearningcomparisonstudies
AT drenkownathan frameworkforrigorousevaluationofhumanperformanceinhumanandmachinelearningcomparisonstudies
AT sheadtimothym frameworkforrigorousevaluationofhumanperformanceinhumanandmachinelearningcomparisonstudies
AT chancefrancess frameworkforrigorousevaluationofhumanperformanceinhumanandmachinelearningcomparisonstudies
AT westerbrock frameworkforrigorousevaluationofhumanperformanceinhumanandmachinelearningcomparisonstudies
AT grayroncalwilliam frameworkforrigorousevaluationofhumanperformanceinhumanandmachinelearningcomparisonstudies