Cargando…

Automatic Grading System Using Sentence-BERT Network

The integration of digital learning technologies into higher education enhances students’ learning by providing opportunities such as online examinations. However, many online examinations tend to have multiple-choice questions, as the marking of text-based questions can be a tedious task for academ...

Descripción completa

Detalles Bibliográficos
Autores principales: Ndukwe, Ifeanyi G., Amadi, Chukwudi E., Nkomo, Larian M., Daniel, Ben K.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7334738/
http://dx.doi.org/10.1007/978-3-030-52240-7_41
Descripción
Sumario:The integration of digital learning technologies into higher education enhances students’ learning by providing opportunities such as online examinations. However, many online examinations tend to have multiple-choice questions, as the marking of text-based questions can be a tedious task for academic staff, especially in large classes. In this study, we utilised SBERT, a pre-trained neural network language model to perform automatic grading of three variations of short answer questions on an Introduction to Networking Computer Science subject. A sample of 228 near-graduation Information Science students from one research-intensive tertiary institution in West African participated in this study. The course instructor manually rated short answers provided by the participants, using a scoring rubric and awarded scores ranging from 0 to 5. Some of the manually graded students’ answers were randomly selected and used as a training set to fine-tune the neural network language model. Then quadratic-weighted kappa (QWKappa) was used to test the agreement level between the ratings generated by the human rater compared with that of the language model, on three variations of questions, including description, comparison and listing. Further, the accuracy of this model was tested on the same questions. Overall results showed that the level of the inter-rater agreement was good on the three variety of questions. Also, the accuracy measures showed that the model performed very well on the comparison and description questions compared to the listing question.