Cargando…

Human‐to‐AI Interrater Agreement for Lung Ultrasound Scoring in COVID‐19 Patients

OBJECTIVES: Lung ultrasound (LUS) has sparked significant interest during COVID‐19. LUS is based on the detection and analysis of imaging patterns. Vertical artifacts and consolidations are some of the recognized patterns in COVID‐19. However, the interrater reliability (IRR) of these findings has n...

Descripción completa

Detalles Bibliográficos
Autores principales: Fatima, Noreen, Mento, Federico, Zanforlin, Alessandro, Smargiassi, Andrea, Torri, Elena, Perrone, Tiziano, Demi, Libertario
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley & Sons, Inc. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9350219/
https://www.ncbi.nlm.nih.gov/pubmed/35796343
http://dx.doi.org/10.1002/jum.16052
_version_ 1784762192530243584
author Fatima, Noreen
Mento, Federico
Zanforlin, Alessandro
Smargiassi, Andrea
Torri, Elena
Perrone, Tiziano
Demi, Libertario
author_facet Fatima, Noreen
Mento, Federico
Zanforlin, Alessandro
Smargiassi, Andrea
Torri, Elena
Perrone, Tiziano
Demi, Libertario
author_sort Fatima, Noreen
collection PubMed
description OBJECTIVES: Lung ultrasound (LUS) has sparked significant interest during COVID‐19. LUS is based on the detection and analysis of imaging patterns. Vertical artifacts and consolidations are some of the recognized patterns in COVID‐19. However, the interrater reliability (IRR) of these findings has not been yet thoroughly investigated. The goal of this study is to assess IRR in LUS COVID‐19 data and determine how many LUS videos and operators are required to obtain a reliable result. METHODS: A total of 1035 LUS videos from 59 COVID‐19 patients were included. Videos were randomly selected from a dataset of 1807 videos and scored by six human operators (HOs). The videos were also analyzed by artificial intelligence (AI) algorithms. Fleiss' kappa coefficient results are presented, evaluated at both the video and prognostic levels. RESULTS: Findings show a stable agreement when evaluating a minimum of 500 videos. The statistical analysis illustrates that, at a video level, a Fleiss' kappa coefficient of 0.464 (95% confidence interval [CI] = 0.455–0.473) and 0.404 (95% CI = 0.396–0.412) is obtained for pairs of HOs and for AI versus HOs, respectively. At prognostic level, a Fleiss' kappa coefficient of 0.505 (95% CI = 0.448–0.562) and 0.506 (95% CI = 0.458–0.555) is obtained for pairs of HOs and for AI versus HOs, respectively. CONCLUSIONS: To examine IRR and obtain a reliable evaluation, a minimum of 500 videos are recommended. Moreover, the employed AI algorithms achieve results that are comparable with HOs. This research further provides a methodology that can be useful to benchmark future LUS studies.
format Online
Article
Text
id pubmed-9350219
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher John Wiley & Sons, Inc.
record_format MEDLINE/PubMed
spelling pubmed-93502192022-08-04 Human‐to‐AI Interrater Agreement for Lung Ultrasound Scoring in COVID‐19 Patients Fatima, Noreen Mento, Federico Zanforlin, Alessandro Smargiassi, Andrea Torri, Elena Perrone, Tiziano Demi, Libertario J Ultrasound Med Original Articles OBJECTIVES: Lung ultrasound (LUS) has sparked significant interest during COVID‐19. LUS is based on the detection and analysis of imaging patterns. Vertical artifacts and consolidations are some of the recognized patterns in COVID‐19. However, the interrater reliability (IRR) of these findings has not been yet thoroughly investigated. The goal of this study is to assess IRR in LUS COVID‐19 data and determine how many LUS videos and operators are required to obtain a reliable result. METHODS: A total of 1035 LUS videos from 59 COVID‐19 patients were included. Videos were randomly selected from a dataset of 1807 videos and scored by six human operators (HOs). The videos were also analyzed by artificial intelligence (AI) algorithms. Fleiss' kappa coefficient results are presented, evaluated at both the video and prognostic levels. RESULTS: Findings show a stable agreement when evaluating a minimum of 500 videos. The statistical analysis illustrates that, at a video level, a Fleiss' kappa coefficient of 0.464 (95% confidence interval [CI] = 0.455–0.473) and 0.404 (95% CI = 0.396–0.412) is obtained for pairs of HOs and for AI versus HOs, respectively. At prognostic level, a Fleiss' kappa coefficient of 0.505 (95% CI = 0.448–0.562) and 0.506 (95% CI = 0.458–0.555) is obtained for pairs of HOs and for AI versus HOs, respectively. CONCLUSIONS: To examine IRR and obtain a reliable evaluation, a minimum of 500 videos are recommended. Moreover, the employed AI algorithms achieve results that are comparable with HOs. This research further provides a methodology that can be useful to benchmark future LUS studies. John Wiley & Sons, Inc. 2022-07-07 /pmc/articles/PMC9350219/ /pubmed/35796343 http://dx.doi.org/10.1002/jum.16052 Text en © 2022 The Authors. Journal of Ultrasound in Medicine published by Wiley Periodicals LLC on behalf of American Institute of Ultrasound in Medicine. https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the terms of the http://creativecommons.org/licenses/by-nc-nd/4.0/ (https://creativecommons.org/licenses/by-nc-nd/4.0/) License, which permits use and distribution in any medium, provided the original work is properly cited, the use is non‐commercial and no modifications or adaptations are made.
spellingShingle Original Articles
Fatima, Noreen
Mento, Federico
Zanforlin, Alessandro
Smargiassi, Andrea
Torri, Elena
Perrone, Tiziano
Demi, Libertario
Human‐to‐AI Interrater Agreement for Lung Ultrasound Scoring in COVID‐19 Patients
title Human‐to‐AI Interrater Agreement for Lung Ultrasound Scoring in COVID‐19 Patients
title_full Human‐to‐AI Interrater Agreement for Lung Ultrasound Scoring in COVID‐19 Patients
title_fullStr Human‐to‐AI Interrater Agreement for Lung Ultrasound Scoring in COVID‐19 Patients
title_full_unstemmed Human‐to‐AI Interrater Agreement for Lung Ultrasound Scoring in COVID‐19 Patients
title_short Human‐to‐AI Interrater Agreement for Lung Ultrasound Scoring in COVID‐19 Patients
title_sort human‐to‐ai interrater agreement for lung ultrasound scoring in covid‐19 patients
topic Original Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9350219/
https://www.ncbi.nlm.nih.gov/pubmed/35796343
http://dx.doi.org/10.1002/jum.16052
work_keys_str_mv AT fatimanoreen humantoaiinterrateragreementforlungultrasoundscoringincovid19patients
AT mentofederico humantoaiinterrateragreementforlungultrasoundscoringincovid19patients
AT zanforlinalessandro humantoaiinterrateragreementforlungultrasoundscoringincovid19patients
AT smargiassiandrea humantoaiinterrateragreementforlungultrasoundscoringincovid19patients
AT torrielena humantoaiinterrateragreementforlungultrasoundscoringincovid19patients
AT perronetiziano humantoaiinterrateragreementforlungultrasoundscoringincovid19patients
AT demilibertario humantoaiinterrateragreementforlungultrasoundscoringincovid19patients