Cargando…

Computerization of Off-Topic Essay Detection: A possibility?

Checking essays written by students is a very time consuming task. Besides spellings and grammar, they also need to be evaluated on their semantic content such as cohesion, coherence, etc. In this study we focus on one such aspect of semantic content which is the topic of the essay. Putting it forma...

Descripción completa

Detalles Bibliográficos
Autores principales: Shahzad, Areeba, Wali, Aamir
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer US 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8769773/
https://www.ncbi.nlm.nih.gov/pubmed/35075343
http://dx.doi.org/10.1007/s10639-021-10863-y
_version_ 1784635216346742784
author Shahzad, Areeba
Wali, Aamir
author_facet Shahzad, Areeba
Wali, Aamir
author_sort Shahzad, Areeba
collection PubMed
description Checking essays written by students is a very time consuming task. Besides spellings and grammar, they also need to be evaluated on their semantic content such as cohesion, coherence, etc. In this study we focus on one such aspect of semantic content which is the topic of the essay. Putting it formally, given a prompt or essay-statement and an essay, this study aims to address the problem of predicting whether the essay is off-topic or not using machine learning techniques. With an increase in online learning and evaluation platforms especially during the COVID-19 pandemic, the off-topic detection system can be very useful to check essays that are mainly submitted online. In this paper, we answer the question: given a prompt and an essay written in Pakistani English, can the process of detecting whether the essay is off-topic or not be reliably and completely autonomized with zero human intervention using currently available tools and techniques? For this purpose, we explore and implement various embedding techniques proposed in recent years to extract similarity or dissimilarity features between question and response, and compare the performance of these techniques using 10 benchmark data sets and 6 classifiers. With different classifiers and different datasets along with different embeddings, we conclude that combining word mover distance, average word embeddings and idf weighted word embeddings together and then using random forest as the classifier is the best combination for off-topic essay detection. The accuracy obtained is 93.5%.
format Online
Article
Text
id pubmed-8769773
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Springer US
record_format MEDLINE/PubMed
spelling pubmed-87697732022-01-20 Computerization of Off-Topic Essay Detection: A possibility? Shahzad, Areeba Wali, Aamir Educ Inf Technol (Dordr) Article Checking essays written by students is a very time consuming task. Besides spellings and grammar, they also need to be evaluated on their semantic content such as cohesion, coherence, etc. In this study we focus on one such aspect of semantic content which is the topic of the essay. Putting it formally, given a prompt or essay-statement and an essay, this study aims to address the problem of predicting whether the essay is off-topic or not using machine learning techniques. With an increase in online learning and evaluation platforms especially during the COVID-19 pandemic, the off-topic detection system can be very useful to check essays that are mainly submitted online. In this paper, we answer the question: given a prompt and an essay written in Pakistani English, can the process of detecting whether the essay is off-topic or not be reliably and completely autonomized with zero human intervention using currently available tools and techniques? For this purpose, we explore and implement various embedding techniques proposed in recent years to extract similarity or dissimilarity features between question and response, and compare the performance of these techniques using 10 benchmark data sets and 6 classifiers. With different classifiers and different datasets along with different embeddings, we conclude that combining word mover distance, average word embeddings and idf weighted word embeddings together and then using random forest as the classifier is the best combination for off-topic essay detection. The accuracy obtained is 93.5%. Springer US 2022-01-20 2022 /pmc/articles/PMC8769773/ /pubmed/35075343 http://dx.doi.org/10.1007/s10639-021-10863-y Text en © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2021 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Article
Shahzad, Areeba
Wali, Aamir
Computerization of Off-Topic Essay Detection: A possibility?
title Computerization of Off-Topic Essay Detection: A possibility?
title_full Computerization of Off-Topic Essay Detection: A possibility?
title_fullStr Computerization of Off-Topic Essay Detection: A possibility?
title_full_unstemmed Computerization of Off-Topic Essay Detection: A possibility?
title_short Computerization of Off-Topic Essay Detection: A possibility?
title_sort computerization of off-topic essay detection: a possibility?
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8769773/
https://www.ncbi.nlm.nih.gov/pubmed/35075343
http://dx.doi.org/10.1007/s10639-021-10863-y
work_keys_str_mv AT shahzadareeba computerizationofofftopicessaydetectionapossibility
AT waliaamir computerizationofofftopicessaydetectionapossibility