Cargando…

iERM: An Interpretable Deep Learning System to Classify Epiretinal Membrane for Different Optical Coherence Tomography Devices: A Multi-Center Analysis

Background: Epiretinal membranes (ERM) have been found to be common among individuals >50 years old. However, the severity grading assessment for ERM based on optical coherence tomography (OCT) images has remained a challenge due to lacking reliable and interpretable analysis methods. Thus, this...

Descripción completa

Detalles Bibliográficos
Autores principales: Jin, Kai, Yan, Yan, Wang, Shuai, Yang, Ce, Chen, Menglu, Liu, Xindi, Terasaki, Hiroto, Yeo, Tun-Hang, Singh, Neha Gulab, Wang, Yao, Ye, Juan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9862104/
https://www.ncbi.nlm.nih.gov/pubmed/36675327
http://dx.doi.org/10.3390/jcm12020400
Descripción
Sumario:Background: Epiretinal membranes (ERM) have been found to be common among individuals >50 years old. However, the severity grading assessment for ERM based on optical coherence tomography (OCT) images has remained a challenge due to lacking reliable and interpretable analysis methods. Thus, this study aimed to develop a two-stage deep learning (DL) system named iERM to provide accurate automatic grading of ERM for clinical practice. Methods: The iERM was trained based on human segmentation of key features to improve classification performance and simultaneously provide interpretability to the classification results. We developed and tested iERM using a total of 4547 OCT B-Scans of four different commercial OCT devices that were collected from nine international medical centers. Results: As per the results, the integrated network effectively improved the grading performance by 1–5.9% compared with the traditional classification DL model and achieved high accuracy scores of 82.9%, 87.0%, and 79.4% in the internal test dataset and two external test datasets, respectively. This is comparable to retinal specialists whose average accuracy scores are 87.8% and 79.4% in two external test datasets. Conclusion: This study proved to be a benchmark method to improve the performance and enhance the interpretability of the traditional DL model with the implementation of segmentation based on prior human knowledge. It may have the potential to provide precise guidance for ERM diagnosis and treatment.