Cargando…

Cross-Domain Authorship Attribution Using Pre-trained Language Models

Authorship attribution attempts to identify the authors behind texts and has important applications mainly in cyber-security, digital humanities and social media analytics. An especially challenging but very realistic scenario is cross-domain attribution where texts of known authorship (training set...

Descripción completa

Detalles Bibliográficos
Autores principales:	Barlas, Georgios, Stamatatos, Efstathios
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	2020
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7256385/ http://dx.doi.org/10.1007/978-3-030-49161-1_22

_version_	1783539896488558592
author	Barlas, Georgios Stamatatos, Efstathios
author_facet	Barlas, Georgios Stamatatos, Efstathios
author_sort	Barlas, Georgios
collection	PubMed
description	Authorship attribution attempts to identify the authors behind texts and has important applications mainly in cyber-security, digital humanities and social media analytics. An especially challenging but very realistic scenario is cross-domain attribution where texts of known authorship (training set) differ from texts of disputed authorship (test set) in topic or genre. In this paper, we modify a successful authorship verification approach based on a multi-headed neural network language model and combine it with pre-trained language models. Based on experiments on a controlled corpus covering several text genres where topic and genre is specifically controlled, we demonstrate that the proposed approach achieves very promising results. We also demonstrate the crucial effect of the normalization corpus in cross-domain attribution.
format	Online Article Text
id	pubmed-7256385
institution	National Center for Biotechnology Information
language	English
publishDate	2020
record_format	MEDLINE/PubMed
spelling	pubmed-72563852020-05-29 Cross-Domain Authorship Attribution Using Pre-trained Language Models Barlas, Georgios Stamatatos, Efstathios Artificial Intelligence Applications and Innovations Article Authorship attribution attempts to identify the authors behind texts and has important applications mainly in cyber-security, digital humanities and social media analytics. An especially challenging but very realistic scenario is cross-domain attribution where texts of known authorship (training set) differ from texts of disputed authorship (test set) in topic or genre. In this paper, we modify a successful authorship verification approach based on a multi-headed neural network language model and combine it with pre-trained language models. Based on experiments on a controlled corpus covering several text genres where topic and genre is specifically controlled, we demonstrate that the proposed approach achieves very promising results. We also demonstrate the crucial effect of the normalization corpus in cross-domain attribution. 2020-05-06 /pmc/articles/PMC7256385/ http://dx.doi.org/10.1007/978-3-030-49161-1_22 Text en © IFIP International Federation for Information Processing 2020 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle	Article Barlas, Georgios Stamatatos, Efstathios Cross-Domain Authorship Attribution Using Pre-trained Language Models
title	Cross-Domain Authorship Attribution Using Pre-trained Language Models
title_full	Cross-Domain Authorship Attribution Using Pre-trained Language Models
title_fullStr	Cross-Domain Authorship Attribution Using Pre-trained Language Models
title_full_unstemmed	Cross-Domain Authorship Attribution Using Pre-trained Language Models
title_short	Cross-Domain Authorship Attribution Using Pre-trained Language Models
title_sort	cross-domain authorship attribution using pre-trained language models
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7256385/ http://dx.doi.org/10.1007/978-3-030-49161-1_22
work_keys_str_mv	AT barlasgeorgios crossdomainauthorshipattributionusingpretrainedlanguagemodels AT stamatatosefstathios crossdomainauthorshipattributionusingpretrainedlanguagemodels

Cross-Domain Authorship Attribution Using Pre-trained Language Models

Ejemplares similares