Cargando…

Exploring Automatic Short Answer Grading as a Tool to Assist in Human Rating

This project proposes using BERT (Bidirectional Encoder Representations from Transformers) as a tool to assist educators with automated short answer grading (ASAG) as opposed to replacing human judgement in high-stakes scenarios. Many educators are hesitant to give authority to an automated system,...

Descripción completa

Detalles Bibliográficos
Autor principal: Condor, Aubrey
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7334737/
http://dx.doi.org/10.1007/978-3-030-52240-7_14
_version_ 1783553994844536832
author Condor, Aubrey
author_facet Condor, Aubrey
author_sort Condor, Aubrey
collection PubMed
description This project proposes using BERT (Bidirectional Encoder Representations from Transformers) as a tool to assist educators with automated short answer grading (ASAG) as opposed to replacing human judgement in high-stakes scenarios. Many educators are hesitant to give authority to an automated system, especially in assessment tasks such as grading constructed response items. However, evaluating free-response text can be time and labor costly for one rater, let alone multiple raters. In addition, some degree of inconsistency exists within and between raters for assessing a given task. Recent advances in Natural Language Processing have resulted in subsequent improvements for technologies that rely on artificial intelligence and human language. New, state-of-the-art models such as BERT, an open source, pre-trained language model, have decreased the amount of training data needed for specific tasks and in turn, have reduced the amount of human annotation necessary for producing a high-quality classification model. After training BERT on expert ratings of constructed responses, we use subsequent automated grading to calculate Cohen’s Kappa as a measure of inter-rater reliability between the automated system and the human rater. For practical application, when the inter-rater reliability metric is unsatisfactory, we suggest that the human rater(s) use the automated model to call attention to ratings where a second opinion might be needed to confirm the rater’s correctness and consistency of judgement.
format Online
Article
Text
id pubmed-7334737
institution National Center for Biotechnology Information
language English
publishDate 2020
record_format MEDLINE/PubMed
spelling pubmed-73347372020-07-06 Exploring Automatic Short Answer Grading as a Tool to Assist in Human Rating Condor, Aubrey Artificial Intelligence in Education Article This project proposes using BERT (Bidirectional Encoder Representations from Transformers) as a tool to assist educators with automated short answer grading (ASAG) as opposed to replacing human judgement in high-stakes scenarios. Many educators are hesitant to give authority to an automated system, especially in assessment tasks such as grading constructed response items. However, evaluating free-response text can be time and labor costly for one rater, let alone multiple raters. In addition, some degree of inconsistency exists within and between raters for assessing a given task. Recent advances in Natural Language Processing have resulted in subsequent improvements for technologies that rely on artificial intelligence and human language. New, state-of-the-art models such as BERT, an open source, pre-trained language model, have decreased the amount of training data needed for specific tasks and in turn, have reduced the amount of human annotation necessary for producing a high-quality classification model. After training BERT on expert ratings of constructed responses, we use subsequent automated grading to calculate Cohen’s Kappa as a measure of inter-rater reliability between the automated system and the human rater. For practical application, when the inter-rater reliability metric is unsatisfactory, we suggest that the human rater(s) use the automated model to call attention to ratings where a second opinion might be needed to confirm the rater’s correctness and consistency of judgement. 2020-06-10 /pmc/articles/PMC7334737/ http://dx.doi.org/10.1007/978-3-030-52240-7_14 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Article
Condor, Aubrey
Exploring Automatic Short Answer Grading as a Tool to Assist in Human Rating
title Exploring Automatic Short Answer Grading as a Tool to Assist in Human Rating
title_full Exploring Automatic Short Answer Grading as a Tool to Assist in Human Rating
title_fullStr Exploring Automatic Short Answer Grading as a Tool to Assist in Human Rating
title_full_unstemmed Exploring Automatic Short Answer Grading as a Tool to Assist in Human Rating
title_short Exploring Automatic Short Answer Grading as a Tool to Assist in Human Rating
title_sort exploring automatic short answer grading as a tool to assist in human rating
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7334737/
http://dx.doi.org/10.1007/978-3-030-52240-7_14
work_keys_str_mv AT condoraubrey exploringautomaticshortanswergradingasatooltoassistinhumanrating