Cargando…

Automatic Personalized Impression Generation for PET Reports Using Large Language Models

PURPOSE: To determine if fine-tuned large language models (LLMs) can generate accurate, personalized impressions for whole-body PET reports. MATERIALS AND METHODS: Twelve language models were trained on a corpus of PET reports using the teacher-forcing algorithm, with the report findings as input an...

Descripción completa

Detalles Bibliográficos
Autores principales:	Tie, Xin, Shin, Muheon, Pirasteh, Ali, Ibrahim, Nevein, Huemann, Zachary, Castellino, Sharon M., Kelly, Kara M., Garrett, John, Hu, Junjie, Cho, Steve Y., Bradshaw, Tyler J.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Cornell University 2023
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10614982/ https://www.ncbi.nlm.nih.gov/pubmed/37904738

_version_	1785129130258333696
author	Tie, Xin Shin, Muheon Pirasteh, Ali Ibrahim, Nevein Huemann, Zachary Castellino, Sharon M. Kelly, Kara M. Garrett, John Hu, Junjie Cho, Steve Y. Bradshaw, Tyler J.
author_facet	Tie, Xin Shin, Muheon Pirasteh, Ali Ibrahim, Nevein Huemann, Zachary Castellino, Sharon M. Kelly, Kara M. Garrett, John Hu, Junjie Cho, Steve Y. Bradshaw, Tyler J.
author_sort	Tie, Xin
collection	PubMed
description	PURPOSE: To determine if fine-tuned large language models (LLMs) can generate accurate, personalized impressions for whole-body PET reports. MATERIALS AND METHODS: Twelve language models were trained on a corpus of PET reports using the teacher-forcing algorithm, with the report findings as input and the clinical impressions as reference. An extra input token encodes the reading physician’s identity, allowing models to learn physician-specific reporting styles. Our corpus comprised 37,370 retrospective PET reports collected from our institution between 2010 and 2022. To identify the best LLM, 30 evaluation metrics were benchmarked against quality scores from two nuclear medicine (NM) physicians, with the most aligned metrics selecting the model for expert evaluation. In a subset of data, model-generated impressions and original clinical impressions were assessed by three NM physicians according to 6 quality dimensions (3-point scale) and an overall utility score (5-point scale). Each physician reviewed 12 of their own reports and 12 reports from other physicians. Bootstrap resampling was used for statistical analysis. RESULTS: Of all evaluation metrics, domain-adapted BARTScore and PEGASUSScore showed the highest Spearman’s ρ correlations (ρ=0.568 and 0.563) with physician preferences. Based on these metrics, the fine-tuned PEGASUS model was selected as the top LLM. When physicians reviewed PEGASUS-generated impressions in their own style, 89% were considered clinically acceptable, with a mean utility score of 4.08 out of 5. Physicians rated these personalized impressions as comparable in overall utility to the impressions dictated by other physicians (4.03, P=0.41). CONCLUSION: Personalized impressions generated by PEGASUS were clinically useful, highlighting its potential to expedite PET reporting.
format	Online Article Text
id	pubmed-10614982
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Cornell University
record_format	MEDLINE/PubMed
spelling	pubmed-106149822023-10-31 Automatic Personalized Impression Generation for PET Reports Using Large Language Models Tie, Xin Shin, Muheon Pirasteh, Ali Ibrahim, Nevein Huemann, Zachary Castellino, Sharon M. Kelly, Kara M. Garrett, John Hu, Junjie Cho, Steve Y. Bradshaw, Tyler J. ArXiv Article PURPOSE: To determine if fine-tuned large language models (LLMs) can generate accurate, personalized impressions for whole-body PET reports. MATERIALS AND METHODS: Twelve language models were trained on a corpus of PET reports using the teacher-forcing algorithm, with the report findings as input and the clinical impressions as reference. An extra input token encodes the reading physician’s identity, allowing models to learn physician-specific reporting styles. Our corpus comprised 37,370 retrospective PET reports collected from our institution between 2010 and 2022. To identify the best LLM, 30 evaluation metrics were benchmarked against quality scores from two nuclear medicine (NM) physicians, with the most aligned metrics selecting the model for expert evaluation. In a subset of data, model-generated impressions and original clinical impressions were assessed by three NM physicians according to 6 quality dimensions (3-point scale) and an overall utility score (5-point scale). Each physician reviewed 12 of their own reports and 12 reports from other physicians. Bootstrap resampling was used for statistical analysis. RESULTS: Of all evaluation metrics, domain-adapted BARTScore and PEGASUSScore showed the highest Spearman’s ρ correlations (ρ=0.568 and 0.563) with physician preferences. Based on these metrics, the fine-tuned PEGASUS model was selected as the top LLM. When physicians reviewed PEGASUS-generated impressions in their own style, 89% were considered clinically acceptable, with a mean utility score of 4.08 out of 5. Physicians rated these personalized impressions as comparable in overall utility to the impressions dictated by other physicians (4.03, P=0.41). CONCLUSION: Personalized impressions generated by PEGASUS were clinically useful, highlighting its potential to expedite PET reporting. Cornell University 2023-10-17 /pmc/articles/PMC10614982/ /pubmed/37904738 Text en https://creativecommons.org/licenses/by/4.0/This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/) , which allows reusers to distribute, remix, adapt, and build upon the material in any medium or format, so long as attribution is given to the creator. The license allows for commercial use.
spellingShingle	Article Tie, Xin Shin, Muheon Pirasteh, Ali Ibrahim, Nevein Huemann, Zachary Castellino, Sharon M. Kelly, Kara M. Garrett, John Hu, Junjie Cho, Steve Y. Bradshaw, Tyler J. Automatic Personalized Impression Generation for PET Reports Using Large Language Models
title	Automatic Personalized Impression Generation for PET Reports Using Large Language Models
title_full	Automatic Personalized Impression Generation for PET Reports Using Large Language Models
title_fullStr	Automatic Personalized Impression Generation for PET Reports Using Large Language Models
title_full_unstemmed	Automatic Personalized Impression Generation for PET Reports Using Large Language Models
title_short	Automatic Personalized Impression Generation for PET Reports Using Large Language Models
title_sort	automatic personalized impression generation for pet reports using large language models
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10614982/ https://www.ncbi.nlm.nih.gov/pubmed/37904738
work_keys_str_mv	AT tiexin automaticpersonalizedimpressiongenerationforpetreportsusinglargelanguagemodels AT shinmuheon automaticpersonalizedimpressiongenerationforpetreportsusinglargelanguagemodels AT pirastehali automaticpersonalizedimpressiongenerationforpetreportsusinglargelanguagemodels AT ibrahimnevein automaticpersonalizedimpressiongenerationforpetreportsusinglargelanguagemodels AT huemannzachary automaticpersonalizedimpressiongenerationforpetreportsusinglargelanguagemodels AT castellinosharonm automaticpersonalizedimpressiongenerationforpetreportsusinglargelanguagemodels AT kellykaram automaticpersonalizedimpressiongenerationforpetreportsusinglargelanguagemodels AT garrettjohn automaticpersonalizedimpressiongenerationforpetreportsusinglargelanguagemodels AT hujunjie automaticpersonalizedimpressiongenerationforpetreportsusinglargelanguagemodels AT chostevey automaticpersonalizedimpressiongenerationforpetreportsusinglargelanguagemodels AT bradshawtylerj automaticpersonalizedimpressiongenerationforpetreportsusinglargelanguagemodels

Automatic Personalized Impression Generation for PET Reports Using Large Language Models

Ejemplares similares