Cargando…
Human‐to‐AI Interrater Agreement for Lung Ultrasound Scoring in COVID‐19 Patients
OBJECTIVES: Lung ultrasound (LUS) has sparked significant interest during COVID‐19. LUS is based on the detection and analysis of imaging patterns. Vertical artifacts and consolidations are some of the recognized patterns in COVID‐19. However, the interrater reliability (IRR) of these findings has n...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
John Wiley & Sons, Inc.
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9350219/ https://www.ncbi.nlm.nih.gov/pubmed/35796343 http://dx.doi.org/10.1002/jum.16052 |
_version_ | 1784762192530243584 |
---|---|
author | Fatima, Noreen Mento, Federico Zanforlin, Alessandro Smargiassi, Andrea Torri, Elena Perrone, Tiziano Demi, Libertario |
author_facet | Fatima, Noreen Mento, Federico Zanforlin, Alessandro Smargiassi, Andrea Torri, Elena Perrone, Tiziano Demi, Libertario |
author_sort | Fatima, Noreen |
collection | PubMed |
description | OBJECTIVES: Lung ultrasound (LUS) has sparked significant interest during COVID‐19. LUS is based on the detection and analysis of imaging patterns. Vertical artifacts and consolidations are some of the recognized patterns in COVID‐19. However, the interrater reliability (IRR) of these findings has not been yet thoroughly investigated. The goal of this study is to assess IRR in LUS COVID‐19 data and determine how many LUS videos and operators are required to obtain a reliable result. METHODS: A total of 1035 LUS videos from 59 COVID‐19 patients were included. Videos were randomly selected from a dataset of 1807 videos and scored by six human operators (HOs). The videos were also analyzed by artificial intelligence (AI) algorithms. Fleiss' kappa coefficient results are presented, evaluated at both the video and prognostic levels. RESULTS: Findings show a stable agreement when evaluating a minimum of 500 videos. The statistical analysis illustrates that, at a video level, a Fleiss' kappa coefficient of 0.464 (95% confidence interval [CI] = 0.455–0.473) and 0.404 (95% CI = 0.396–0.412) is obtained for pairs of HOs and for AI versus HOs, respectively. At prognostic level, a Fleiss' kappa coefficient of 0.505 (95% CI = 0.448–0.562) and 0.506 (95% CI = 0.458–0.555) is obtained for pairs of HOs and for AI versus HOs, respectively. CONCLUSIONS: To examine IRR and obtain a reliable evaluation, a minimum of 500 videos are recommended. Moreover, the employed AI algorithms achieve results that are comparable with HOs. This research further provides a methodology that can be useful to benchmark future LUS studies. |
format | Online Article Text |
id | pubmed-9350219 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | John Wiley & Sons, Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-93502192022-08-04 Human‐to‐AI Interrater Agreement for Lung Ultrasound Scoring in COVID‐19 Patients Fatima, Noreen Mento, Federico Zanforlin, Alessandro Smargiassi, Andrea Torri, Elena Perrone, Tiziano Demi, Libertario J Ultrasound Med Original Articles OBJECTIVES: Lung ultrasound (LUS) has sparked significant interest during COVID‐19. LUS is based on the detection and analysis of imaging patterns. Vertical artifacts and consolidations are some of the recognized patterns in COVID‐19. However, the interrater reliability (IRR) of these findings has not been yet thoroughly investigated. The goal of this study is to assess IRR in LUS COVID‐19 data and determine how many LUS videos and operators are required to obtain a reliable result. METHODS: A total of 1035 LUS videos from 59 COVID‐19 patients were included. Videos were randomly selected from a dataset of 1807 videos and scored by six human operators (HOs). The videos were also analyzed by artificial intelligence (AI) algorithms. Fleiss' kappa coefficient results are presented, evaluated at both the video and prognostic levels. RESULTS: Findings show a stable agreement when evaluating a minimum of 500 videos. The statistical analysis illustrates that, at a video level, a Fleiss' kappa coefficient of 0.464 (95% confidence interval [CI] = 0.455–0.473) and 0.404 (95% CI = 0.396–0.412) is obtained for pairs of HOs and for AI versus HOs, respectively. At prognostic level, a Fleiss' kappa coefficient of 0.505 (95% CI = 0.448–0.562) and 0.506 (95% CI = 0.458–0.555) is obtained for pairs of HOs and for AI versus HOs, respectively. CONCLUSIONS: To examine IRR and obtain a reliable evaluation, a minimum of 500 videos are recommended. Moreover, the employed AI algorithms achieve results that are comparable with HOs. This research further provides a methodology that can be useful to benchmark future LUS studies. John Wiley & Sons, Inc. 2022-07-07 /pmc/articles/PMC9350219/ /pubmed/35796343 http://dx.doi.org/10.1002/jum.16052 Text en © 2022 The Authors. Journal of Ultrasound in Medicine published by Wiley Periodicals LLC on behalf of American Institute of Ultrasound in Medicine. https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the terms of the http://creativecommons.org/licenses/by-nc-nd/4.0/ (https://creativecommons.org/licenses/by-nc-nd/4.0/) License, which permits use and distribution in any medium, provided the original work is properly cited, the use is non‐commercial and no modifications or adaptations are made. |
spellingShingle | Original Articles Fatima, Noreen Mento, Federico Zanforlin, Alessandro Smargiassi, Andrea Torri, Elena Perrone, Tiziano Demi, Libertario Human‐to‐AI Interrater Agreement for Lung Ultrasound Scoring in COVID‐19 Patients |
title | Human‐to‐AI Interrater Agreement for Lung Ultrasound Scoring in COVID‐19 Patients |
title_full | Human‐to‐AI Interrater Agreement for Lung Ultrasound Scoring in COVID‐19 Patients |
title_fullStr | Human‐to‐AI Interrater Agreement for Lung Ultrasound Scoring in COVID‐19 Patients |
title_full_unstemmed | Human‐to‐AI Interrater Agreement for Lung Ultrasound Scoring in COVID‐19 Patients |
title_short | Human‐to‐AI Interrater Agreement for Lung Ultrasound Scoring in COVID‐19 Patients |
title_sort | human‐to‐ai interrater agreement for lung ultrasound scoring in covid‐19 patients |
topic | Original Articles |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9350219/ https://www.ncbi.nlm.nih.gov/pubmed/35796343 http://dx.doi.org/10.1002/jum.16052 |
work_keys_str_mv | AT fatimanoreen humantoaiinterrateragreementforlungultrasoundscoringincovid19patients AT mentofederico humantoaiinterrateragreementforlungultrasoundscoringincovid19patients AT zanforlinalessandro humantoaiinterrateragreementforlungultrasoundscoringincovid19patients AT smargiassiandrea humantoaiinterrateragreementforlungultrasoundscoringincovid19patients AT torrielena humantoaiinterrateragreementforlungultrasoundscoringincovid19patients AT perronetiziano humantoaiinterrateragreementforlungultrasoundscoringincovid19patients AT demilibertario humantoaiinterrateragreementforlungultrasoundscoringincovid19patients |