Cargando…

Fighting reviewer fatigue or amplifying bias? Considerations and recommendations for use of ChatGPT and other Large Language Models in scholarly peer review

BACKGROUND: The emergence of systems based on large language models (LLMs) such as OpenAI’s ChatGPT has created a range of discussions in scholarly circles. Since LLMs generate grammatically correct and mostly relevant (yet sometimes outright wrong, irrelevant or biased) outputs in response to provi...

Descripción completa

Detalles Bibliográficos
Autores principales:	Hosseini, Mohammad, Horbach, Serge P.J.M.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	American Journal Experts 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9980209/ https://www.ncbi.nlm.nih.gov/pubmed/36865238 http://dx.doi.org/10.21203/rs.3.rs-2587766/v1

_version_	1784899869453844480
author	Hosseini, Mohammad Horbach, Serge P.J.M.
author_facet	Hosseini, Mohammad Horbach, Serge P.J.M.
author_sort	Hosseini, Mohammad
collection	PubMed
description	BACKGROUND: The emergence of systems based on large language models (LLMs) such as OpenAI’s ChatGPT has created a range of discussions in scholarly circles. Since LLMs generate grammatically correct and mostly relevant (yet sometimes outright wrong, irrelevant or biased) outputs in response to provided prompts, using them in various writing tasks including writing peer review reports could result in improved productivity. Given the significance of peer reviews in the existing scholarly publication landscape, exploring challenges and opportunities of using LLMs in peer review seems urgent. After the generation of the first scholarly outputs with LLMs, we anticipate that peer review reports too would be generated with the help of these systems. However, there are currently no guidelines on how these systems should be used in review tasks. METHODS: To investigate the potential impact of using LLMs on the peer review process, we used five core themes within discussions about peer review suggested by Tennant and Ross-Hellauer. These include 1) reviewers’ role, 2) editors’ role, 3) functions and quality of peer reviews, 4) reproducibility, and 5) the social and epistemic functions of peer reviews. We provide a small-scale exploration of ChatGPT’s performance regarding identified issues. RESULTS: LLMs have the potential to substantially alter the role of both peer reviewers and editors. Through supporting both actors in efficiently writing constructive reports or decision letters, LLMs can facilitate higher quality review and address issues of review shortage. However, the fundamental opacity of LLMs’ inner workings and development, raise questions and concerns about potential biases and the reliability of review reports. Additionally, as editorial work has a prominent function in defining and shaping epistemic communities, as well as negotiating normative frameworks within such communities, partly outsourcing this work to LLMs might have unforeseen consequences for social and epistemic relations within academia. Regarding performance, we identified major enhancements in only a few weeks (between December 2022 and January 2023) and expect ChatGPT to continue improving. CONCLUSIONS: We believe that LLMs are likely to have a profound impact on academia and scholarly communication. While they have the potential to address several current issues within the scholarly communication system, many uncertainties remain and their use is not without risks. In particular, concerns about the amplification of existing biases and inequalities in access to appropriate infrastructure warrant further attention. For the moment, we recommend that if LLMs are used to write scholarly reviews, reviewers should disclose their use and accept full responsibility for their reports’ accuracy, tone, reasoning and originality.
format	Online Article Text
id	pubmed-9980209
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	American Journal Experts
record_format	MEDLINE/PubMed
spelling	pubmed-99802092023-03-03 Fighting reviewer fatigue or amplifying bias? Considerations and recommendations for use of ChatGPT and other Large Language Models in scholarly peer review Hosseini, Mohammad Horbach, Serge P.J.M. Res Sq Article BACKGROUND: The emergence of systems based on large language models (LLMs) such as OpenAI’s ChatGPT has created a range of discussions in scholarly circles. Since LLMs generate grammatically correct and mostly relevant (yet sometimes outright wrong, irrelevant or biased) outputs in response to provided prompts, using them in various writing tasks including writing peer review reports could result in improved productivity. Given the significance of peer reviews in the existing scholarly publication landscape, exploring challenges and opportunities of using LLMs in peer review seems urgent. After the generation of the first scholarly outputs with LLMs, we anticipate that peer review reports too would be generated with the help of these systems. However, there are currently no guidelines on how these systems should be used in review tasks. METHODS: To investigate the potential impact of using LLMs on the peer review process, we used five core themes within discussions about peer review suggested by Tennant and Ross-Hellauer. These include 1) reviewers’ role, 2) editors’ role, 3) functions and quality of peer reviews, 4) reproducibility, and 5) the social and epistemic functions of peer reviews. We provide a small-scale exploration of ChatGPT’s performance regarding identified issues. RESULTS: LLMs have the potential to substantially alter the role of both peer reviewers and editors. Through supporting both actors in efficiently writing constructive reports or decision letters, LLMs can facilitate higher quality review and address issues of review shortage. However, the fundamental opacity of LLMs’ inner workings and development, raise questions and concerns about potential biases and the reliability of review reports. Additionally, as editorial work has a prominent function in defining and shaping epistemic communities, as well as negotiating normative frameworks within such communities, partly outsourcing this work to LLMs might have unforeseen consequences for social and epistemic relations within academia. Regarding performance, we identified major enhancements in only a few weeks (between December 2022 and January 2023) and expect ChatGPT to continue improving. CONCLUSIONS: We believe that LLMs are likely to have a profound impact on academia and scholarly communication. While they have the potential to address several current issues within the scholarly communication system, many uncertainties remain and their use is not without risks. In particular, concerns about the amplification of existing biases and inequalities in access to appropriate infrastructure warrant further attention. For the moment, we recommend that if LLMs are used to write scholarly reviews, reviewers should disclose their use and accept full responsibility for their reports’ accuracy, tone, reasoning and originality. American Journal Experts 2023-02-20 /pmc/articles/PMC9980209/ /pubmed/36865238 http://dx.doi.org/10.21203/rs.3.rs-2587766/v1 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use. https://creativecommons.org/licenses/by/4.0/License: This work is licensed under a Creative Commons Attribution 4.0 International License. Read Full License (https://creativecommons.org/licenses/by/4.0/)
spellingShingle	Article Hosseini, Mohammad Horbach, Serge P.J.M. Fighting reviewer fatigue or amplifying bias? Considerations and recommendations for use of ChatGPT and other Large Language Models in scholarly peer review
title	Fighting reviewer fatigue or amplifying bias? Considerations and recommendations for use of ChatGPT and other Large Language Models in scholarly peer review
title_full	Fighting reviewer fatigue or amplifying bias? Considerations and recommendations for use of ChatGPT and other Large Language Models in scholarly peer review
title_fullStr	Fighting reviewer fatigue or amplifying bias? Considerations and recommendations for use of ChatGPT and other Large Language Models in scholarly peer review
title_full_unstemmed	Fighting reviewer fatigue or amplifying bias? Considerations and recommendations for use of ChatGPT and other Large Language Models in scholarly peer review
title_short	Fighting reviewer fatigue or amplifying bias? Considerations and recommendations for use of ChatGPT and other Large Language Models in scholarly peer review
title_sort	fighting reviewer fatigue or amplifying bias? considerations and recommendations for use of chatgpt and other large language models in scholarly peer review
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9980209/ https://www.ncbi.nlm.nih.gov/pubmed/36865238 http://dx.doi.org/10.21203/rs.3.rs-2587766/v1
work_keys_str_mv	AT hosseinimohammad fightingreviewerfatigueoramplifyingbiasconsiderationsandrecommendationsforuseofchatgptandotherlargelanguagemodelsinscholarlypeerreview AT horbachsergepjm fightingreviewerfatigueoramplifyingbiasconsiderationsandrecommendationsforuseofchatgptandotherlargelanguagemodelsinscholarlypeerreview

Fighting reviewer fatigue or amplifying bias? Considerations and recommendations for use of ChatGPT and other Large Language Models in scholarly peer review

Ejemplares similares