Cargando…
Exploring Automatic Short Answer Grading as a Tool to Assist in Human Rating
This project proposes using BERT (Bidirectional Encoder Representations from Transformers) as a tool to assist educators with automated short answer grading (ASAG) as opposed to replacing human judgement in high-stakes scenarios. Many educators are hesitant to give authority to an automated system,...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7334737/ http://dx.doi.org/10.1007/978-3-030-52240-7_14 |
_version_ | 1783553994844536832 |
---|---|
author | Condor, Aubrey |
author_facet | Condor, Aubrey |
author_sort | Condor, Aubrey |
collection | PubMed |
description | This project proposes using BERT (Bidirectional Encoder Representations from Transformers) as a tool to assist educators with automated short answer grading (ASAG) as opposed to replacing human judgement in high-stakes scenarios. Many educators are hesitant to give authority to an automated system, especially in assessment tasks such as grading constructed response items. However, evaluating free-response text can be time and labor costly for one rater, let alone multiple raters. In addition, some degree of inconsistency exists within and between raters for assessing a given task. Recent advances in Natural Language Processing have resulted in subsequent improvements for technologies that rely on artificial intelligence and human language. New, state-of-the-art models such as BERT, an open source, pre-trained language model, have decreased the amount of training data needed for specific tasks and in turn, have reduced the amount of human annotation necessary for producing a high-quality classification model. After training BERT on expert ratings of constructed responses, we use subsequent automated grading to calculate Cohen’s Kappa as a measure of inter-rater reliability between the automated system and the human rater. For practical application, when the inter-rater reliability metric is unsatisfactory, we suggest that the human rater(s) use the automated model to call attention to ratings where a second opinion might be needed to confirm the rater’s correctness and consistency of judgement. |
format | Online Article Text |
id | pubmed-7334737 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
record_format | MEDLINE/PubMed |
spelling | pubmed-73347372020-07-06 Exploring Automatic Short Answer Grading as a Tool to Assist in Human Rating Condor, Aubrey Artificial Intelligence in Education Article This project proposes using BERT (Bidirectional Encoder Representations from Transformers) as a tool to assist educators with automated short answer grading (ASAG) as opposed to replacing human judgement in high-stakes scenarios. Many educators are hesitant to give authority to an automated system, especially in assessment tasks such as grading constructed response items. However, evaluating free-response text can be time and labor costly for one rater, let alone multiple raters. In addition, some degree of inconsistency exists within and between raters for assessing a given task. Recent advances in Natural Language Processing have resulted in subsequent improvements for technologies that rely on artificial intelligence and human language. New, state-of-the-art models such as BERT, an open source, pre-trained language model, have decreased the amount of training data needed for specific tasks and in turn, have reduced the amount of human annotation necessary for producing a high-quality classification model. After training BERT on expert ratings of constructed responses, we use subsequent automated grading to calculate Cohen’s Kappa as a measure of inter-rater reliability between the automated system and the human rater. For practical application, when the inter-rater reliability metric is unsatisfactory, we suggest that the human rater(s) use the automated model to call attention to ratings where a second opinion might be needed to confirm the rater’s correctness and consistency of judgement. 2020-06-10 /pmc/articles/PMC7334737/ http://dx.doi.org/10.1007/978-3-030-52240-7_14 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic. |
spellingShingle | Article Condor, Aubrey Exploring Automatic Short Answer Grading as a Tool to Assist in Human Rating |
title | Exploring Automatic Short Answer Grading as a Tool to Assist in Human Rating |
title_full | Exploring Automatic Short Answer Grading as a Tool to Assist in Human Rating |
title_fullStr | Exploring Automatic Short Answer Grading as a Tool to Assist in Human Rating |
title_full_unstemmed | Exploring Automatic Short Answer Grading as a Tool to Assist in Human Rating |
title_short | Exploring Automatic Short Answer Grading as a Tool to Assist in Human Rating |
title_sort | exploring automatic short answer grading as a tool to assist in human rating |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7334737/ http://dx.doi.org/10.1007/978-3-030-52240-7_14 |
work_keys_str_mv | AT condoraubrey exploringautomaticshortanswergradingasatooltoassistinhumanrating |