Cargando…

A semi supervised approach to Arabic aspect category detection using Bert and teacher-student model

Aspect-based sentiment analysis tasks are well researched in English. However, we find such research lacking in the context of the Arabic language, especially with reference to aspect category detection. Most of this research is focusing on supervised machine learning methods that require the use of...

Descripción completa

Detalles Bibliográficos
Autores principales: Almasri, Miada, Al-Malki, Norah, Alotaibi, Reem
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10280399/
https://www.ncbi.nlm.nih.gov/pubmed/37346563
http://dx.doi.org/10.7717/peerj-cs.1425
_version_ 1785060785287856128
author Almasri, Miada
Al-Malki, Norah
Alotaibi, Reem
author_facet Almasri, Miada
Al-Malki, Norah
Alotaibi, Reem
author_sort Almasri, Miada
collection PubMed
description Aspect-based sentiment analysis tasks are well researched in English. However, we find such research lacking in the context of the Arabic language, especially with reference to aspect category detection. Most of this research is focusing on supervised machine learning methods that require the use of large, labeled datasets. Therefore, the aim of this research is to implement a semi-supervised self-training approach which utilizes a noisy student framework to enhance the capability of a deep learning model, AraBERT v02. The objective is to perform aspect category detection on both the SemEval 2016 hotel review dataset and the Hotel Arabic-Reviews Dataset (HARD) 2016. The four-step framework firstly entails developing a teacher model that is trained on the aspect categories of the SemEval 2016 labeled dataset. Secondly, it generates pseudo labels for the unlabeled HARD dataset based on the teacher model. Thirdly, it creates a noisy student model that is trained on the combined datasets (∼1 million sentences). The aim is to minimize the combined cross entropy loss. Fourthly, an ensembling of both teacher and student models is carried out to enhance the performance of AraBERT. Findings indicate that the ensembled teacher-student model demonstrates a 0.3% improvement in its micro F1 over the initial noisy student implementation, both in predicting the Aspect Categories in the combined datasets. However, it has achieved a 1% increase over the micro F1 of the teacher model. These results outperform both baselines and other deep learning models discussed in the related literature.
format Online
Article
Text
id pubmed-10280399
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-102803992023-06-21 A semi supervised approach to Arabic aspect category detection using Bert and teacher-student model Almasri, Miada Al-Malki, Norah Alotaibi, Reem PeerJ Comput Sci Data Mining and Machine Learning Aspect-based sentiment analysis tasks are well researched in English. However, we find such research lacking in the context of the Arabic language, especially with reference to aspect category detection. Most of this research is focusing on supervised machine learning methods that require the use of large, labeled datasets. Therefore, the aim of this research is to implement a semi-supervised self-training approach which utilizes a noisy student framework to enhance the capability of a deep learning model, AraBERT v02. The objective is to perform aspect category detection on both the SemEval 2016 hotel review dataset and the Hotel Arabic-Reviews Dataset (HARD) 2016. The four-step framework firstly entails developing a teacher model that is trained on the aspect categories of the SemEval 2016 labeled dataset. Secondly, it generates pseudo labels for the unlabeled HARD dataset based on the teacher model. Thirdly, it creates a noisy student model that is trained on the combined datasets (∼1 million sentences). The aim is to minimize the combined cross entropy loss. Fourthly, an ensembling of both teacher and student models is carried out to enhance the performance of AraBERT. Findings indicate that the ensembled teacher-student model demonstrates a 0.3% improvement in its micro F1 over the initial noisy student implementation, both in predicting the Aspect Categories in the combined datasets. However, it has achieved a 1% increase over the micro F1 of the teacher model. These results outperform both baselines and other deep learning models discussed in the related literature. PeerJ Inc. 2023-06-08 /pmc/articles/PMC10280399/ /pubmed/37346563 http://dx.doi.org/10.7717/peerj-cs.1425 Text en ©2023 Almasri et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle Data Mining and Machine Learning
Almasri, Miada
Al-Malki, Norah
Alotaibi, Reem
A semi supervised approach to Arabic aspect category detection using Bert and teacher-student model
title A semi supervised approach to Arabic aspect category detection using Bert and teacher-student model
title_full A semi supervised approach to Arabic aspect category detection using Bert and teacher-student model
title_fullStr A semi supervised approach to Arabic aspect category detection using Bert and teacher-student model
title_full_unstemmed A semi supervised approach to Arabic aspect category detection using Bert and teacher-student model
title_short A semi supervised approach to Arabic aspect category detection using Bert and teacher-student model
title_sort semi supervised approach to arabic aspect category detection using bert and teacher-student model
topic Data Mining and Machine Learning
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10280399/
https://www.ncbi.nlm.nih.gov/pubmed/37346563
http://dx.doi.org/10.7717/peerj-cs.1425
work_keys_str_mv AT almasrimiada asemisupervisedapproachtoarabicaspectcategorydetectionusingbertandteacherstudentmodel
AT almalkinorah asemisupervisedapproachtoarabicaspectcategorydetectionusingbertandteacherstudentmodel
AT alotaibireem asemisupervisedapproachtoarabicaspectcategorydetectionusingbertandteacherstudentmodel
AT almasrimiada semisupervisedapproachtoarabicaspectcategorydetectionusingbertandteacherstudentmodel
AT almalkinorah semisupervisedapproachtoarabicaspectcategorydetectionusingbertandteacherstudentmodel
AT alotaibireem semisupervisedapproachtoarabicaspectcategorydetectionusingbertandteacherstudentmodel