Cargando…

Exploring Automatic Short Answer Grading as a Tool to Assist in Human Rating

This project proposes using BERT (Bidirectional Encoder Representations from Transformers) as a tool to assist educators with automated short answer grading (ASAG) as opposed to replacing human judgement in high-stakes scenarios. Many educators are hesitant to give authority to an automated system,...

Descripción completa

Detalles Bibliográficos
Autor principal:	Condor, Aubrey
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	2020
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7334737/ http://dx.doi.org/10.1007/978-3-030-52240-7_14

_version_	1783553994844536832
author	Condor, Aubrey
author_facet	Condor, Aubrey
author_sort	Condor, Aubrey
collection	PubMed
description	This project proposes using BERT (Bidirectional Encoder Representations from Transformers) as a tool to assist educators with automated short answer grading (ASAG) as opposed to replacing human judgement in high-stakes scenarios. Many educators are hesitant to give authority to an automated system, especially in assessment tasks such as grading constructed response items. However, evaluating free-response text can be time and labor costly for one rater, let alone multiple raters. In addition, some degree of inconsistency exists within and between raters for assessing a given task. Recent advances in Natural Language Processing have resulted in subsequent improvements for technologies that rely on artificial intelligence and human language. New, state-of-the-art models such as BERT, an open source, pre-trained language model, have decreased the amount of training data needed for specific tasks and in turn, have reduced the amount of human annotation necessary for producing a high-quality classification model. After training BERT on expert ratings of constructed responses, we use subsequent automated grading to calculate Cohen’s Kappa as a measure of inter-rater reliability between the automated system and the human rater. For practical application, when the inter-rater reliability metric is unsatisfactory, we suggest that the human rater(s) use the automated model to call attention to ratings where a second opinion might be needed to confirm the rater’s correctness and consistency of judgement.
format	Online Article Text
id	pubmed-7334737
institution	National Center for Biotechnology Information
language	English
publishDate	2020
record_format	MEDLINE/PubMed
spelling	pubmed-73347372020-07-06 Exploring Automatic Short Answer Grading as a Tool to Assist in Human Rating Condor, Aubrey Artificial Intelligence in Education Article This project proposes using BERT (Bidirectional Encoder Representations from Transformers) as a tool to assist educators with automated short answer grading (ASAG) as opposed to replacing human judgement in high-stakes scenarios. Many educators are hesitant to give authority to an automated system, especially in assessment tasks such as grading constructed response items. However, evaluating free-response text can be time and labor costly for one rater, let alone multiple raters. In addition, some degree of inconsistency exists within and between raters for assessing a given task. Recent advances in Natural Language Processing have resulted in subsequent improvements for technologies that rely on artificial intelligence and human language. New, state-of-the-art models such as BERT, an open source, pre-trained language model, have decreased the amount of training data needed for specific tasks and in turn, have reduced the amount of human annotation necessary for producing a high-quality classification model. After training BERT on expert ratings of constructed responses, we use subsequent automated grading to calculate Cohen’s Kappa as a measure of inter-rater reliability between the automated system and the human rater. For practical application, when the inter-rater reliability metric is unsatisfactory, we suggest that the human rater(s) use the automated model to call attention to ratings where a second opinion might be needed to confirm the rater’s correctness and consistency of judgement. 2020-06-10 /pmc/articles/PMC7334737/ http://dx.doi.org/10.1007/978-3-030-52240-7_14 Text en © Springer Nature Switzerland AG 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle	Article Condor, Aubrey Exploring Automatic Short Answer Grading as a Tool to Assist in Human Rating
title	Exploring Automatic Short Answer Grading as a Tool to Assist in Human Rating
title_full	Exploring Automatic Short Answer Grading as a Tool to Assist in Human Rating
title_fullStr	Exploring Automatic Short Answer Grading as a Tool to Assist in Human Rating
title_full_unstemmed	Exploring Automatic Short Answer Grading as a Tool to Assist in Human Rating
title_short	Exploring Automatic Short Answer Grading as a Tool to Assist in Human Rating
title_sort	exploring automatic short answer grading as a tool to assist in human rating
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7334737/ http://dx.doi.org/10.1007/978-3-030-52240-7_14
work_keys_str_mv	AT condoraubrey exploringautomaticshortanswergradingasatooltoassistinhumanrating

Exploring Automatic Short Answer Grading as a Tool to Assist in Human Rating

Ejemplares similares