Cargando…
Computerization of Off-Topic Essay Detection: A possibility?
Checking essays written by students is a very time consuming task. Besides spellings and grammar, they also need to be evaluated on their semantic content such as cohesion, coherence, etc. In this study we focus on one such aspect of semantic content which is the topic of the essay. Putting it forma...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer US
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8769773/ https://www.ncbi.nlm.nih.gov/pubmed/35075343 http://dx.doi.org/10.1007/s10639-021-10863-y |
Sumario: | Checking essays written by students is a very time consuming task. Besides spellings and grammar, they also need to be evaluated on their semantic content such as cohesion, coherence, etc. In this study we focus on one such aspect of semantic content which is the topic of the essay. Putting it formally, given a prompt or essay-statement and an essay, this study aims to address the problem of predicting whether the essay is off-topic or not using machine learning techniques. With an increase in online learning and evaluation platforms especially during the COVID-19 pandemic, the off-topic detection system can be very useful to check essays that are mainly submitted online. In this paper, we answer the question: given a prompt and an essay written in Pakistani English, can the process of detecting whether the essay is off-topic or not be reliably and completely autonomized with zero human intervention using currently available tools and techniques? For this purpose, we explore and implement various embedding techniques proposed in recent years to extract similarity or dissimilarity features between question and response, and compare the performance of these techniques using 10 benchmark data sets and 6 classifiers. With different classifiers and different datasets along with different embeddings, we conclude that combining word mover distance, average word embeddings and idf weighted word embeddings together and then using random forest as the classifier is the best combination for off-topic essay detection. The accuracy obtained is 93.5%. |
---|