Cargando…

Evaluating progress in automatic chest X-ray radiology report generation

Artificial intelligence (AI) models for automatic generation of narrative radiology reports from images have the potential to enhance efficiency and reduce the workload of radiologists. However, evaluating the correctness of these reports requires metrics that can capture clinically pertinent differ...

Descripción completa

Detalles Bibliográficos
Autores principales: Yu, Feiyang, Endo, Mark, Krishnan, Rayan, Pan, Ian, Tsai, Andy, Reis, Eduardo Pontes, Fonseca, Eduardo Kaiser Ururahy Nunes, Lee, Henrique Min Ho, Abad, Zahra Shakeri Hossein, Ng, Andrew Y., Langlotz, Curtis P., Venugopal, Vasantha Kumar, Rajpurkar, Pranav
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10499844/
https://www.ncbi.nlm.nih.gov/pubmed/37720336
http://dx.doi.org/10.1016/j.patter.2023.100802
Descripción
Sumario:Artificial intelligence (AI) models for automatic generation of narrative radiology reports from images have the potential to enhance efficiency and reduce the workload of radiologists. However, evaluating the correctness of these reports requires metrics that can capture clinically pertinent differences. In this study, we investigate the alignment between automated metrics and radiologists' scoring of errors in report generation. We address the limitations of existing metrics by proposing new metrics, RadGraph F1 and RadCliQ, which demonstrate stronger correlation with radiologists' evaluations. In addition, we analyze the failure modes of the metrics to understand their limitations and provide guidance for metric selection and interpretation. This study establishes RadGraph F1 and RadCliQ as meaningful metrics for guiding future research in radiology report generation.