Cargando…

A model-independent redundancy measure for human versus ChatGPT authorship discrimination using a Bayesian probabilistic approach

The academic and scientific world in general is increasingly concerned about their inability to determine and ascertain the identity of the writer of a text. More and more often the question arises as to whether a scientific article or work handed in by a student was actually produced by the alleged...

Descripción completa

Detalles Bibliográficos
Autores principales:	Bozza, Silvia, Roten, Claude-Alain, Jover, Antoine, Cammarota, Valentina, Pousaz, Lionel, Taroni, Franco
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Nature Publishing Group UK 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10628141/ https://www.ncbi.nlm.nih.gov/pubmed/37932415 http://dx.doi.org/10.1038/s41598-023-46390-8

_version_	1785131690798088192
author	Bozza, Silvia Roten, Claude-Alain Jover, Antoine Cammarota, Valentina Pousaz, Lionel Taroni, Franco
author_facet	Bozza, Silvia Roten, Claude-Alain Jover, Antoine Cammarota, Valentina Pousaz, Lionel Taroni, Franco
author_sort	Bozza, Silvia
collection	PubMed
description	The academic and scientific world in general is increasingly concerned about their inability to determine and ascertain the identity of the writer of a text. More and more often the question arises as to whether a scientific article or work handed in by a student was actually produced by the alleged author of the questioned text. The role of artificial intelligence (AI) is increasingly debated due to its dangers of undeclared use. A current example is undoubtedly the undeclared use of ChatGPT to write a scientific text. The article promotes an AI model-independent redundancy measure to support discrimination between hypotheses on authorship of various multilingual texts written by humans or produced by intelligence media such as ChatGPT. The syntax of texts written by humans tends to differ from that of texts produced by AIs. This difference can be grasped and quantified even with short texts (i.e. 1800 characters). This aspect of length is extremely important, because short texts imply a greater difficulty of analysis to characterize authorship. To meet the efficiency criteria required for the evaluation of forensic evidence, a probabilistic approach is implemented. In particular, to assess the value of the redundancy measure and to offer a consistent classification criterion, a metric called Bayes factor is implemented. The proposed Bayesian probabilistic method represents an original approach in stylometry. Analyses performed over multilingual texts (English and French) covering different scientific and human areas of interest (forensic science and socio-psycho-artistic topics) reveal the feasibility of a successful authorship discrimination with limited misclassification rates. Model performance is satisfactory even with small sample sizes.
format	Online Article Text
id	pubmed-10628141
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Nature Publishing Group UK
record_format	MEDLINE/PubMed
spelling	pubmed-106281412023-11-08 A model-independent redundancy measure for human versus ChatGPT authorship discrimination using a Bayesian probabilistic approach Bozza, Silvia Roten, Claude-Alain Jover, Antoine Cammarota, Valentina Pousaz, Lionel Taroni, Franco Sci Rep Article The academic and scientific world in general is increasingly concerned about their inability to determine and ascertain the identity of the writer of a text. More and more often the question arises as to whether a scientific article or work handed in by a student was actually produced by the alleged author of the questioned text. The role of artificial intelligence (AI) is increasingly debated due to its dangers of undeclared use. A current example is undoubtedly the undeclared use of ChatGPT to write a scientific text. The article promotes an AI model-independent redundancy measure to support discrimination between hypotheses on authorship of various multilingual texts written by humans or produced by intelligence media such as ChatGPT. The syntax of texts written by humans tends to differ from that of texts produced by AIs. This difference can be grasped and quantified even with short texts (i.e. 1800 characters). This aspect of length is extremely important, because short texts imply a greater difficulty of analysis to characterize authorship. To meet the efficiency criteria required for the evaluation of forensic evidence, a probabilistic approach is implemented. In particular, to assess the value of the redundancy measure and to offer a consistent classification criterion, a metric called Bayes factor is implemented. The proposed Bayesian probabilistic method represents an original approach in stylometry. Analyses performed over multilingual texts (English and French) covering different scientific and human areas of interest (forensic science and socio-psycho-artistic topics) reveal the feasibility of a successful authorship discrimination with limited misclassification rates. Model performance is satisfactory even with small sample sizes. Nature Publishing Group UK 2023-11-06 /pmc/articles/PMC10628141/ /pubmed/37932415 http://dx.doi.org/10.1038/s41598-023-46390-8 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle	Article Bozza, Silvia Roten, Claude-Alain Jover, Antoine Cammarota, Valentina Pousaz, Lionel Taroni, Franco A model-independent redundancy measure for human versus ChatGPT authorship discrimination using a Bayesian probabilistic approach
title	A model-independent redundancy measure for human versus ChatGPT authorship discrimination using a Bayesian probabilistic approach
title_full	A model-independent redundancy measure for human versus ChatGPT authorship discrimination using a Bayesian probabilistic approach
title_fullStr	A model-independent redundancy measure for human versus ChatGPT authorship discrimination using a Bayesian probabilistic approach
title_full_unstemmed	A model-independent redundancy measure for human versus ChatGPT authorship discrimination using a Bayesian probabilistic approach
title_short	A model-independent redundancy measure for human versus ChatGPT authorship discrimination using a Bayesian probabilistic approach
title_sort	model-independent redundancy measure for human versus chatgpt authorship discrimination using a bayesian probabilistic approach
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10628141/ https://www.ncbi.nlm.nih.gov/pubmed/37932415 http://dx.doi.org/10.1038/s41598-023-46390-8
work_keys_str_mv	AT bozzasilvia amodelindependentredundancymeasureforhumanversuschatgptauthorshipdiscriminationusingabayesianprobabilisticapproach AT rotenclaudealain amodelindependentredundancymeasureforhumanversuschatgptauthorshipdiscriminationusingabayesianprobabilisticapproach AT joverantoine amodelindependentredundancymeasureforhumanversuschatgptauthorshipdiscriminationusingabayesianprobabilisticapproach AT cammarotavalentina amodelindependentredundancymeasureforhumanversuschatgptauthorshipdiscriminationusingabayesianprobabilisticapproach AT pousazlionel amodelindependentredundancymeasureforhumanversuschatgptauthorshipdiscriminationusingabayesianprobabilisticapproach AT taronifranco amodelindependentredundancymeasureforhumanversuschatgptauthorshipdiscriminationusingabayesianprobabilisticapproach AT bozzasilvia modelindependentredundancymeasureforhumanversuschatgptauthorshipdiscriminationusingabayesianprobabilisticapproach AT rotenclaudealain modelindependentredundancymeasureforhumanversuschatgptauthorshipdiscriminationusingabayesianprobabilisticapproach AT joverantoine modelindependentredundancymeasureforhumanversuschatgptauthorshipdiscriminationusingabayesianprobabilisticapproach AT cammarotavalentina modelindependentredundancymeasureforhumanversuschatgptauthorshipdiscriminationusingabayesianprobabilisticapproach AT pousazlionel modelindependentredundancymeasureforhumanversuschatgptauthorshipdiscriminationusingabayesianprobabilisticapproach AT taronifranco modelindependentredundancymeasureforhumanversuschatgptauthorshipdiscriminationusingabayesianprobabilisticapproach

A model-independent redundancy measure for human versus ChatGPT authorship discrimination using a Bayesian probabilistic approach

Ejemplares similares