Cargando…

Test accuracy of artificial intelligence-based grading of fundus images in diabetic retinopathy screening: A systematic review

OBJECTIVES: To systematically review the accuracy of artificial intelligence (AI)-based systems for grading of fundus images in diabetic retinopathy (DR) screening. METHODS: We searched MEDLINE, EMBASE, the Cochrane Library and the ClinicalTrials.gov from 1st January 2000 to 27th August 2021. Accura...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhelev, Zhivko, Peters, Jaime, Rogers, Morwenna, Allen, Michael, Kijauskaite, Goda, Seedat, Farah, Wilkinson, Elizabeth, Hyde, Christopher
Formato: Online Artículo Texto
Lenguaje:English
Publicado: SAGE Publications 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10399100/
https://www.ncbi.nlm.nih.gov/pubmed/36617971
http://dx.doi.org/10.1177/09691413221144382
_version_ 1785084198102499328
author Zhelev, Zhivko
Peters, Jaime
Rogers, Morwenna
Allen, Michael
Kijauskaite, Goda
Seedat, Farah
Wilkinson, Elizabeth
Hyde, Christopher
author_facet Zhelev, Zhivko
Peters, Jaime
Rogers, Morwenna
Allen, Michael
Kijauskaite, Goda
Seedat, Farah
Wilkinson, Elizabeth
Hyde, Christopher
author_sort Zhelev, Zhivko
collection PubMed
description OBJECTIVES: To systematically review the accuracy of artificial intelligence (AI)-based systems for grading of fundus images in diabetic retinopathy (DR) screening. METHODS: We searched MEDLINE, EMBASE, the Cochrane Library and the ClinicalTrials.gov from 1st January 2000 to 27th August 2021. Accuracy studies published in English were included if they met the pre-specified inclusion criteria. Selection of studies for inclusion, data extraction and quality assessment were conducted by one author with a second reviewer independently screening and checking 20% of titles. Results were analysed narratively. RESULTS: Forty-three studies evaluating 15 deep learning (DL) and 4 machine learning (ML) systems were included. Nine systems were evaluated in a single study each. Most studies were judged to be at high or unclear risk of bias in at least one QUADAS-2 domain. Sensitivity for referable DR and higher grades was ≥85% while specificity varied and was <80% for all ML systems and in 6/31 studies evaluating DL systems. Studies reported high accuracy for detection of ungradable images, but the latter were analysed and reported inconsistently. Seven studies reported that AI was more sensitive but less specific than human graders. CONCLUSIONS: AI-based systems are more sensitive than human graders and could be safe to use in clinical practice but have variable specificity. However, for many systems evidence is limited, at high risk of bias and may not generalise across settings. Therefore, pre-implementation assessment in the target clinical pathway is essential to obtain reliable and applicable accuracy estimates.
format Online
Article
Text
id pubmed-10399100
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher SAGE Publications
record_format MEDLINE/PubMed
spelling pubmed-103991002023-08-04 Test accuracy of artificial intelligence-based grading of fundus images in diabetic retinopathy screening: A systematic review Zhelev, Zhivko Peters, Jaime Rogers, Morwenna Allen, Michael Kijauskaite, Goda Seedat, Farah Wilkinson, Elizabeth Hyde, Christopher J Med Screen Reviews OBJECTIVES: To systematically review the accuracy of artificial intelligence (AI)-based systems for grading of fundus images in diabetic retinopathy (DR) screening. METHODS: We searched MEDLINE, EMBASE, the Cochrane Library and the ClinicalTrials.gov from 1st January 2000 to 27th August 2021. Accuracy studies published in English were included if they met the pre-specified inclusion criteria. Selection of studies for inclusion, data extraction and quality assessment were conducted by one author with a second reviewer independently screening and checking 20% of titles. Results were analysed narratively. RESULTS: Forty-three studies evaluating 15 deep learning (DL) and 4 machine learning (ML) systems were included. Nine systems were evaluated in a single study each. Most studies were judged to be at high or unclear risk of bias in at least one QUADAS-2 domain. Sensitivity for referable DR and higher grades was ≥85% while specificity varied and was <80% for all ML systems and in 6/31 studies evaluating DL systems. Studies reported high accuracy for detection of ungradable images, but the latter were analysed and reported inconsistently. Seven studies reported that AI was more sensitive but less specific than human graders. CONCLUSIONS: AI-based systems are more sensitive than human graders and could be safe to use in clinical practice but have variable specificity. However, for many systems evidence is limited, at high risk of bias and may not generalise across settings. Therefore, pre-implementation assessment in the target clinical pathway is essential to obtain reliable and applicable accuracy estimates. SAGE Publications 2023-01-09 2023-09 /pmc/articles/PMC10399100/ /pubmed/36617971 http://dx.doi.org/10.1177/09691413221144382 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/This article is distributed under the terms of the Creative Commons Attribution 4.0 License (https://creativecommons.org/licenses/by/4.0/) which permits any use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access page (https://us.sagepub.com/en-us/nam/open-access-at-sage).
spellingShingle Reviews
Zhelev, Zhivko
Peters, Jaime
Rogers, Morwenna
Allen, Michael
Kijauskaite, Goda
Seedat, Farah
Wilkinson, Elizabeth
Hyde, Christopher
Test accuracy of artificial intelligence-based grading of fundus images in diabetic retinopathy screening: A systematic review
title Test accuracy of artificial intelligence-based grading of fundus images in diabetic retinopathy screening: A systematic review
title_full Test accuracy of artificial intelligence-based grading of fundus images in diabetic retinopathy screening: A systematic review
title_fullStr Test accuracy of artificial intelligence-based grading of fundus images in diabetic retinopathy screening: A systematic review
title_full_unstemmed Test accuracy of artificial intelligence-based grading of fundus images in diabetic retinopathy screening: A systematic review
title_short Test accuracy of artificial intelligence-based grading of fundus images in diabetic retinopathy screening: A systematic review
title_sort test accuracy of artificial intelligence-based grading of fundus images in diabetic retinopathy screening: a systematic review
topic Reviews
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10399100/
https://www.ncbi.nlm.nih.gov/pubmed/36617971
http://dx.doi.org/10.1177/09691413221144382
work_keys_str_mv AT zhelevzhivko testaccuracyofartificialintelligencebasedgradingoffundusimagesindiabeticretinopathyscreeningasystematicreview
AT petersjaime testaccuracyofartificialintelligencebasedgradingoffundusimagesindiabeticretinopathyscreeningasystematicreview
AT rogersmorwenna testaccuracyofartificialintelligencebasedgradingoffundusimagesindiabeticretinopathyscreeningasystematicreview
AT allenmichael testaccuracyofartificialintelligencebasedgradingoffundusimagesindiabeticretinopathyscreeningasystematicreview
AT kijauskaitegoda testaccuracyofartificialintelligencebasedgradingoffundusimagesindiabeticretinopathyscreeningasystematicreview
AT seedatfarah testaccuracyofartificialintelligencebasedgradingoffundusimagesindiabeticretinopathyscreeningasystematicreview
AT wilkinsonelizabeth testaccuracyofartificialintelligencebasedgradingoffundusimagesindiabeticretinopathyscreeningasystematicreview
AT hydechristopher testaccuracyofartificialintelligencebasedgradingoffundusimagesindiabeticretinopathyscreeningasystematicreview