Cargando…

The unreasonable effectiveness of large language models in zero-shot semantic annotation of legal texts

The emergence of ChatGPT has sensitized the general public, including the legal profession, to large language models' (LLMs) potential uses (e.g., document drafting, question answering, and summarization). Although recent studies have shown how well the technology performs in diverse semantic a...

Descripción completa

Detalles Bibliográficos
Autores principales:	Savelka, Jaromir, Ashley, Kevin D.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2023
Materias:	Artificial Intelligence
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10690809/ https://www.ncbi.nlm.nih.gov/pubmed/38045764 http://dx.doi.org/10.3389/frai.2023.1279794

_version_	1785152600970100736
author	Savelka, Jaromir Ashley, Kevin D.
author_facet	Savelka, Jaromir Ashley, Kevin D.
author_sort	Savelka, Jaromir
collection	PubMed
description	The emergence of ChatGPT has sensitized the general public, including the legal profession, to large language models' (LLMs) potential uses (e.g., document drafting, question answering, and summarization). Although recent studies have shown how well the technology performs in diverse semantic annotation tasks focused on legal texts, an influx of newer, more capable (GPT-4) or cost-effective (GPT-3.5-turbo) models requires another analysis. This paper addresses recent developments in the ability of LLMs to semantically annotate legal texts in zero-shot learning settings. Given the transition to mature generative AI systems, we examine the performance of GPT-4 and GPT-3.5-turbo(-16k), comparing it to the previous generation of GPT models, on three legal text annotation tasks involving diverse documents such as adjudicatory opinions, contractual clauses, or statutory provisions. We also compare the models' performance and cost to better understand the trade-offs. We found that the GPT-4 model clearly outperforms the GPT-3.5 models on two of the three tasks. The cost-effective GPT-3.5-turbo matches the performance of the 20× more expensive text-davinci-003 model. While one can annotate multiple data points within a single prompt, the performance degrades as the size of the batch increases. This work provides valuable information relevant for many practical applications (e.g., in contract review) and research projects (e.g., in empirical legal studies). Legal scholars and practicing lawyers alike can leverage these findings to guide their decisions in integrating LLMs in a wide range of workflows involving semantic annotation of legal texts.
format	Online Article Text
id	pubmed-10690809
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-106908092023-12-02 The unreasonable effectiveness of large language models in zero-shot semantic annotation of legal texts Savelka, Jaromir Ashley, Kevin D. Front Artif Intell Artificial Intelligence The emergence of ChatGPT has sensitized the general public, including the legal profession, to large language models' (LLMs) potential uses (e.g., document drafting, question answering, and summarization). Although recent studies have shown how well the technology performs in diverse semantic annotation tasks focused on legal texts, an influx of newer, more capable (GPT-4) or cost-effective (GPT-3.5-turbo) models requires another analysis. This paper addresses recent developments in the ability of LLMs to semantically annotate legal texts in zero-shot learning settings. Given the transition to mature generative AI systems, we examine the performance of GPT-4 and GPT-3.5-turbo(-16k), comparing it to the previous generation of GPT models, on three legal text annotation tasks involving diverse documents such as adjudicatory opinions, contractual clauses, or statutory provisions. We also compare the models' performance and cost to better understand the trade-offs. We found that the GPT-4 model clearly outperforms the GPT-3.5 models on two of the three tasks. The cost-effective GPT-3.5-turbo matches the performance of the 20× more expensive text-davinci-003 model. While one can annotate multiple data points within a single prompt, the performance degrades as the size of the batch increases. This work provides valuable information relevant for many practical applications (e.g., in contract review) and research projects (e.g., in empirical legal studies). Legal scholars and practicing lawyers alike can leverage these findings to guide their decisions in integrating LLMs in a wide range of workflows involving semantic annotation of legal texts. Frontiers Media S.A. 2023-11-17 /pmc/articles/PMC10690809/ /pubmed/38045764 http://dx.doi.org/10.3389/frai.2023.1279794 Text en Copyright © 2023 Savelka and Ashley. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Artificial Intelligence Savelka, Jaromir Ashley, Kevin D. The unreasonable effectiveness of large language models in zero-shot semantic annotation of legal texts
title	The unreasonable effectiveness of large language models in zero-shot semantic annotation of legal texts
title_full	The unreasonable effectiveness of large language models in zero-shot semantic annotation of legal texts
title_fullStr	The unreasonable effectiveness of large language models in zero-shot semantic annotation of legal texts
title_full_unstemmed	The unreasonable effectiveness of large language models in zero-shot semantic annotation of legal texts
title_short	The unreasonable effectiveness of large language models in zero-shot semantic annotation of legal texts
title_sort	unreasonable effectiveness of large language models in zero-shot semantic annotation of legal texts
topic	Artificial Intelligence
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10690809/ https://www.ncbi.nlm.nih.gov/pubmed/38045764 http://dx.doi.org/10.3389/frai.2023.1279794
work_keys_str_mv	AT savelkajaromir theunreasonableeffectivenessoflargelanguagemodelsinzeroshotsemanticannotationoflegaltexts AT ashleykevind theunreasonableeffectivenessoflargelanguagemodelsinzeroshotsemanticannotationoflegaltexts AT savelkajaromir unreasonableeffectivenessoflargelanguagemodelsinzeroshotsemanticannotationoflegaltexts AT ashleykevind unreasonableeffectivenessoflargelanguagemodelsinzeroshotsemanticannotationoflegaltexts

The unreasonable effectiveness of large language models in zero-shot semantic annotation of legal texts

Ejemplares similares