Cargando…

Utilizing a Digital Swarm Intelligence Platform to Improve Consensus Among Radiologists and Exploring Its Applications

Radiologists today play a central role in making diagnostic decisions and labeling images for training and benchmarking artificial intelligence (AI) algorithms. A key concern is low inter-reader reliability (IRR) seen between experts when interpreting challenging cases. While team-based decisions ar...

Descripción completa

Detalles Bibliográficos
Autores principales: Shah, Rutwik, Astuto Arouche Nunes, Bruno, Gleason, Tyler, Fletcher, Will, Banaga, Justin, Sweetwood, Kevin, Ye, Allen, Patel, Rina, McGill, Kevin, Link, Thomas, Crane, Jason, Pedoia, Valentina, Majumdar, Sharmila
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10039189/
https://www.ncbi.nlm.nih.gov/pubmed/36414832
http://dx.doi.org/10.1007/s10278-022-00662-3
Descripción
Sumario:Radiologists today play a central role in making diagnostic decisions and labeling images for training and benchmarking artificial intelligence (AI) algorithms. A key concern is low inter-reader reliability (IRR) seen between experts when interpreting challenging cases. While team-based decisions are known to outperform individual decisions, inter-personal biases often creep up in group interactions which limit nondominant participants from expressing true opinions. To overcome the dual problems of low consensus and interpersonal bias, we explored a solution modeled on bee swarms. Two separate cohorts, three board-certified radiologists, (cohort 1), and five radiology residents (cohort 2) collaborated on a digital swarm platform in real time and in a blinded fashion, grading meniscal lesions on knee MR exams. These consensus votes were benchmarked against clinical (arthroscopy) and radiological (senior-most radiologist) standards of reference using Cohen’s kappa. The IRR of the consensus votes was then compared to the IRR of the majority and most confident votes of the two cohorts. IRR was also calculated for predictions from a meniscal lesion detecting AI algorithm. The attending cohort saw an improvement of 23% in IRR of swarm votes (k = 0.34) over majority vote (k = 0.11). Similar improvement of 23% in IRR (k = 0.25) in 3-resident swarm votes over majority vote (k = 0.02) was observed. The 5-resident swarm had an even higher improvement of 30% in IRR (k = 0.37) over majority vote (k = 0.07). The swarm consensus votes outperformed individual and majority vote decision in both the radiologists and resident cohorts. The attending and resident swarms also outperformed predictions from a state-of-the-art AI algorithm. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1007/s10278-022-00662-3.