Cargando…

Multi-task learning for toxic comment classification and rationale extraction

Social media content moderation is the standard practice as on today to promote healthy discussion forums. Toxic span prediction is helpful for explaining the toxic comment classification labels, thus is an important step towards building automated moderation systems. The relation between toxic comm...

Descripción completa

Detalles Bibliográficos
Autores principales:	Nelatoori, Kiran Babu, Kommanti, Hima Bindu
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Springer US 2022
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9391651/ https://www.ncbi.nlm.nih.gov/pubmed/36034685 http://dx.doi.org/10.1007/s10844-022-00726-4

_version_	1784770896285663232
author	Nelatoori, Kiran Babu Kommanti, Hima Bindu
author_facet	Nelatoori, Kiran Babu Kommanti, Hima Bindu
author_sort	Nelatoori, Kiran Babu
collection	PubMed
description	Social media content moderation is the standard practice as on today to promote healthy discussion forums. Toxic span prediction is helpful for explaining the toxic comment classification labels, thus is an important step towards building automated moderation systems. The relation between toxic comment classification and toxic span prediction makes joint learning objective meaningful. We propose a multi-task learning model using ToxicXLMR for bidirectional contextual embeddings of input text for toxic comment classification, and a Bi-LSTM CRF layer for toxic span or rationale identification. To enable multi-task learning in this domain, we have curated a dataset from Jigsaw and Toxic span prediction datasets. The proposed model outperformed the single task models on the curated and toxic span prediction datasets with 4% and 2% improvement for classification and rationale identification, respectively. We investigated the domain adaptation ability of the proposed MTL model on HASOC and OLID datasets that contain the out of domain text from Twitter and found a 3% improvement in the F1 score over single task models.
format	Online Article Text
id	pubmed-9391651
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Springer US
record_format	MEDLINE/PubMed
spelling	pubmed-93916512022-08-22 Multi-task learning for toxic comment classification and rationale extraction Nelatoori, Kiran Babu Kommanti, Hima Bindu J Intell Inf Syst Article Social media content moderation is the standard practice as on today to promote healthy discussion forums. Toxic span prediction is helpful for explaining the toxic comment classification labels, thus is an important step towards building automated moderation systems. The relation between toxic comment classification and toxic span prediction makes joint learning objective meaningful. We propose a multi-task learning model using ToxicXLMR for bidirectional contextual embeddings of input text for toxic comment classification, and a Bi-LSTM CRF layer for toxic span or rationale identification. To enable multi-task learning in this domain, we have curated a dataset from Jigsaw and Toxic span prediction datasets. The proposed model outperformed the single task models on the curated and toxic span prediction datasets with 4% and 2% improvement for classification and rationale identification, respectively. We investigated the domain adaptation ability of the proposed MTL model on HASOC and OLID datasets that contain the out of domain text from Twitter and found a 3% improvement in the F1 score over single task models. Springer US 2022-08-20 2023 /pmc/articles/PMC9391651/ /pubmed/36034685 http://dx.doi.org/10.1007/s10844-022-00726-4 Text en © The Author(s), under exclusive licence to Springer Science+Business Media, LLC, part of Springer Nature 2022, Springer Nature or its licensor holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law. This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle	Article Nelatoori, Kiran Babu Kommanti, Hima Bindu Multi-task learning for toxic comment classification and rationale extraction
title	Multi-task learning for toxic comment classification and rationale extraction
title_full	Multi-task learning for toxic comment classification and rationale extraction
title_fullStr	Multi-task learning for toxic comment classification and rationale extraction
title_full_unstemmed	Multi-task learning for toxic comment classification and rationale extraction
title_short	Multi-task learning for toxic comment classification and rationale extraction
title_sort	multi-task learning for toxic comment classification and rationale extraction
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9391651/ https://www.ncbi.nlm.nih.gov/pubmed/36034685 http://dx.doi.org/10.1007/s10844-022-00726-4
work_keys_str_mv	AT nelatoorikiranbabu multitasklearningfortoxiccommentclassificationandrationaleextraction AT kommantihimabindu multitasklearningfortoxiccommentclassificationandrationaleextraction

Multi-task learning for toxic comment classification and rationale extraction

Ejemplares similares