Cargando…

Feature-based detection of automated language models: tackling GPT-2, GPT-3 and Grover

The recent improvements of language models have drawn much attention to potential cases of use and abuse of automatically generated text. Great effort is put into the development of methods to detect machine generations among human-written text in order to avoid scenarios in which the large-scale ge...

Descripción completa

Detalles Bibliográficos
Autores principales:	Fröhling, Leon, Zubiaga, Arkaitz
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	PeerJ Inc. 2021
Materias:	Artificial Intelligence
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8049133/ https://www.ncbi.nlm.nih.gov/pubmed/33954234 http://dx.doi.org/10.7717/peerj-cs.443

_version_	1783679369909108736
author	Fröhling, Leon Zubiaga, Arkaitz
author_facet	Fröhling, Leon Zubiaga, Arkaitz
author_sort	Fröhling, Leon
collection	PubMed
description	The recent improvements of language models have drawn much attention to potential cases of use and abuse of automatically generated text. Great effort is put into the development of methods to detect machine generations among human-written text in order to avoid scenarios in which the large-scale generation of text with minimal cost and effort undermines the trust in human interaction and factual information online. While most of the current approaches rely on the availability of expensive language models, we propose a simple feature-based classifier for the detection problem, using carefully crafted features that attempt to model intrinsic differences between human and machine text. Our research contributes to the field in producing a detection method that achieves performance competitive with far more expensive methods, offering an accessible “first line-of-defense” against the abuse of language models. Furthermore, our experiments show that different sampling methods lead to different types of flaws in generated text.
format	Online Article Text
id	pubmed-8049133
institution	National Center for Biotechnology Information
language	English
publishDate	2021
publisher	PeerJ Inc.
record_format	MEDLINE/PubMed
spelling	pubmed-80491332021-05-04 Feature-based detection of automated language models: tackling GPT-2, GPT-3 and Grover Fröhling, Leon Zubiaga, Arkaitz PeerJ Comput Sci Artificial Intelligence The recent improvements of language models have drawn much attention to potential cases of use and abuse of automatically generated text. Great effort is put into the development of methods to detect machine generations among human-written text in order to avoid scenarios in which the large-scale generation of text with minimal cost and effort undermines the trust in human interaction and factual information online. While most of the current approaches rely on the availability of expensive language models, we propose a simple feature-based classifier for the detection problem, using carefully crafted features that attempt to model intrinsic differences between human and machine text. Our research contributes to the field in producing a detection method that achieves performance competitive with far more expensive methods, offering an accessible “first line-of-defense” against the abuse of language models. Furthermore, our experiments show that different sampling methods lead to different types of flaws in generated text. PeerJ Inc. 2021-04-06 /pmc/articles/PMC8049133/ /pubmed/33954234 http://dx.doi.org/10.7717/peerj-cs.443 Text en © 2021 Fröhling and Zubiaga https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle	Artificial Intelligence Fröhling, Leon Zubiaga, Arkaitz Feature-based detection of automated language models: tackling GPT-2, GPT-3 and Grover
title	Feature-based detection of automated language models: tackling GPT-2, GPT-3 and Grover
title_full	Feature-based detection of automated language models: tackling GPT-2, GPT-3 and Grover
title_fullStr	Feature-based detection of automated language models: tackling GPT-2, GPT-3 and Grover
title_full_unstemmed	Feature-based detection of automated language models: tackling GPT-2, GPT-3 and Grover
title_short	Feature-based detection of automated language models: tackling GPT-2, GPT-3 and Grover
title_sort	feature-based detection of automated language models: tackling gpt-2, gpt-3 and grover
topic	Artificial Intelligence
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8049133/ https://www.ncbi.nlm.nih.gov/pubmed/33954234 http://dx.doi.org/10.7717/peerj-cs.443
work_keys_str_mv	AT frohlingleon featurebaseddetectionofautomatedlanguagemodelstacklinggpt2gpt3andgrover AT zubiagaarkaitz featurebaseddetectionofautomatedlanguagemodelstacklinggpt2gpt3andgrover

Feature-based detection of automated language models: tackling GPT-2, GPT-3 and Grover

Ejemplares similares