Cargando…
Computerization of Off-Topic Essay Detection: A possibility?
Checking essays written by students is a very time consuming task. Besides spellings and grammar, they also need to be evaluated on their semantic content such as cohesion, coherence, etc. In this study we focus on one such aspect of semantic content which is the topic of the essay. Putting it forma...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer US
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8769773/ https://www.ncbi.nlm.nih.gov/pubmed/35075343 http://dx.doi.org/10.1007/s10639-021-10863-y |
_version_ | 1784635216346742784 |
---|---|
author | Shahzad, Areeba Wali, Aamir |
author_facet | Shahzad, Areeba Wali, Aamir |
author_sort | Shahzad, Areeba |
collection | PubMed |
description | Checking essays written by students is a very time consuming task. Besides spellings and grammar, they also need to be evaluated on their semantic content such as cohesion, coherence, etc. In this study we focus on one such aspect of semantic content which is the topic of the essay. Putting it formally, given a prompt or essay-statement and an essay, this study aims to address the problem of predicting whether the essay is off-topic or not using machine learning techniques. With an increase in online learning and evaluation platforms especially during the COVID-19 pandemic, the off-topic detection system can be very useful to check essays that are mainly submitted online. In this paper, we answer the question: given a prompt and an essay written in Pakistani English, can the process of detecting whether the essay is off-topic or not be reliably and completely autonomized with zero human intervention using currently available tools and techniques? For this purpose, we explore and implement various embedding techniques proposed in recent years to extract similarity or dissimilarity features between question and response, and compare the performance of these techniques using 10 benchmark data sets and 6 classifiers. With different classifiers and different datasets along with different embeddings, we conclude that combining word mover distance, average word embeddings and idf weighted word embeddings together and then using random forest as the classifier is the best combination for off-topic essay detection. The accuracy obtained is 93.5%. |
format | Online Article Text |
id | pubmed-8769773 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | Springer US |
record_format | MEDLINE/PubMed |
spelling | pubmed-87697732022-01-20 Computerization of Off-Topic Essay Detection: A possibility? Shahzad, Areeba Wali, Aamir Educ Inf Technol (Dordr) Article Checking essays written by students is a very time consuming task. Besides spellings and grammar, they also need to be evaluated on their semantic content such as cohesion, coherence, etc. In this study we focus on one such aspect of semantic content which is the topic of the essay. Putting it formally, given a prompt or essay-statement and an essay, this study aims to address the problem of predicting whether the essay is off-topic or not using machine learning techniques. With an increase in online learning and evaluation platforms especially during the COVID-19 pandemic, the off-topic detection system can be very useful to check essays that are mainly submitted online. In this paper, we answer the question: given a prompt and an essay written in Pakistani English, can the process of detecting whether the essay is off-topic or not be reliably and completely autonomized with zero human intervention using currently available tools and techniques? For this purpose, we explore and implement various embedding techniques proposed in recent years to extract similarity or dissimilarity features between question and response, and compare the performance of these techniques using 10 benchmark data sets and 6 classifiers. With different classifiers and different datasets along with different embeddings, we conclude that combining word mover distance, average word embeddings and idf weighted word embeddings together and then using random forest as the classifier is the best combination for off-topic essay detection. The accuracy obtained is 93.5%. Springer US 2022-01-20 2022 /pmc/articles/PMC8769773/ /pubmed/35075343 http://dx.doi.org/10.1007/s10639-021-10863-y Text en © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2021 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic. |
spellingShingle | Article Shahzad, Areeba Wali, Aamir Computerization of Off-Topic Essay Detection: A possibility? |
title | Computerization of Off-Topic Essay Detection: A possibility? |
title_full | Computerization of Off-Topic Essay Detection: A possibility? |
title_fullStr | Computerization of Off-Topic Essay Detection: A possibility? |
title_full_unstemmed | Computerization of Off-Topic Essay Detection: A possibility? |
title_short | Computerization of Off-Topic Essay Detection: A possibility? |
title_sort | computerization of off-topic essay detection: a possibility? |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8769773/ https://www.ncbi.nlm.nih.gov/pubmed/35075343 http://dx.doi.org/10.1007/s10639-021-10863-y |
work_keys_str_mv | AT shahzadareeba computerizationofofftopicessaydetectionapossibility AT waliaamir computerizationofofftopicessaydetectionapossibility |