Cargando…

Performance of a Deep Learning Model vs Human Reviewers in Grading Endoscopic Disease Severity of Patients With Ulcerative Colitis

IMPORTANCE: Assessing endoscopic disease severity in ulcerative colitis (UC) is a key element in determining therapeutic response, but its use in clinical practice is limited by the requirement for experienced human reviewers. OBJECTIVE: To determine whether deep learning models can grade the endosc...

Descripción completa

Detalles Bibliográficos
Autores principales:	Stidham, Ryan W., Liu, Wenshuo, Bishu, Shrinivas, Rice, Michael D., Higgins, Peter D. R., Zhu, Ji, Nallamothu, Brahmajee K., Waljee, Akbar K.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	American Medical Association 2019
Materias:	Original Investigation
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6537821/ https://www.ncbi.nlm.nih.gov/pubmed/31099869 http://dx.doi.org/10.1001/jamanetworkopen.2019.3963

_version_	1783422088933015552
author	Stidham, Ryan W. Liu, Wenshuo Bishu, Shrinivas Rice, Michael D. Higgins, Peter D. R. Zhu, Ji Nallamothu, Brahmajee K. Waljee, Akbar K.
author_facet	Stidham, Ryan W. Liu, Wenshuo Bishu, Shrinivas Rice, Michael D. Higgins, Peter D. R. Zhu, Ji Nallamothu, Brahmajee K. Waljee, Akbar K.
author_sort	Stidham, Ryan W.
collection	PubMed
description	IMPORTANCE: Assessing endoscopic disease severity in ulcerative colitis (UC) is a key element in determining therapeutic response, but its use in clinical practice is limited by the requirement for experienced human reviewers. OBJECTIVE: To determine whether deep learning models can grade the endoscopic severity of UC as well as experienced human reviewers. DESIGN, SETTING, AND PARTICIPANTS: In this diagnostic study, retrospective grading of endoscopic images using the 4-level Mayo subscore was performed by 2 independent reviewers with score discrepancies adjudicated by a third reviewer. Using 16 514 images from 3082 patients with UC who underwent colonoscopy at a single tertiary care referral center in the United States between January 1, 2007, and December 31, 2017, a 159-layer convolutional neural network (CNN) was constructed as a deep learning model to train and categorize images into 2 clinically relevant groups: remission (Mayo subscore 0 or 1) and moderate to severe disease (Mayo subscore, 2 or 3). Ninety percent of the cohort was used to build the model and 10% was used to test it; the process was repeated 10 times. A set of 30 full-motion colonoscopy videos, unseen by the model, was then used for external validation to mimic real-world application. MAIN OUTCOMES AND MEASURES: Model performance was assessed using area under the receiver operating curve (AUROC), sensitivity and specificity, positive predictive value (PPV), and negative predictive value (NPV). Kappa statistics (κ) were used to measure agreement of the CNN relative to adjudicated human reference cores. RESULTS: The authors included 16 514 images from 3082 unique patients (median [IQR] age, 41.3 [26.1-61.8] years, 1678 [54.4%] female), with 3980 images (24.1%) classified as moderate-to-severe disease by the adjudicated reference score. The CNN was excellent for distinguishing endoscopic remission from moderate-to-severe disease with an AUROC of 0.966 (95% CI, 0.967-0.972); a PPV of 0.87 (95% CI, 0.85-0.88) with a sensitivity of 83.0% (95% CI, 80.8%-85.4%) and specificty of 96.0% (95% CI, 95.1%-97.1%); and NPV of 0.94 (95% CI, 0.93-0.95). Weighted κ agreement between the CNN and the adjudicated reference score was also good for identifying exact Mayo subscores (κ = 0.84; 95% CI, 0.83-0.86) and was similar to the agreement between experienced reviewers (κ = 0.86; 95% CI, 0.85-0.87). Applying the CNN to entire colonoscopy videos had similar accuracy for identifying moderate to severe disease (AUROC, 0.97; 95% CI, 0.963-0.969). CONCLUSIONS AND RELEVANCE: This study found that deep learning model performance was similar to experienced human reviewers in grading endoscopic severity of UC. Given its scalability, this approach could improve the use of colonoscopy for UC in both research and routine practice.
format	Online Article Text
id	pubmed-6537821
institution	National Center for Biotechnology Information
language	English
publishDate	2019
publisher	American Medical Association
record_format	MEDLINE/PubMed
spelling	pubmed-65378212019-06-12 Performance of a Deep Learning Model vs Human Reviewers in Grading Endoscopic Disease Severity of Patients With Ulcerative Colitis Stidham, Ryan W. Liu, Wenshuo Bishu, Shrinivas Rice, Michael D. Higgins, Peter D. R. Zhu, Ji Nallamothu, Brahmajee K. Waljee, Akbar K. JAMA Netw Open Original Investigation IMPORTANCE: Assessing endoscopic disease severity in ulcerative colitis (UC) is a key element in determining therapeutic response, but its use in clinical practice is limited by the requirement for experienced human reviewers. OBJECTIVE: To determine whether deep learning models can grade the endoscopic severity of UC as well as experienced human reviewers. DESIGN, SETTING, AND PARTICIPANTS: In this diagnostic study, retrospective grading of endoscopic images using the 4-level Mayo subscore was performed by 2 independent reviewers with score discrepancies adjudicated by a third reviewer. Using 16 514 images from 3082 patients with UC who underwent colonoscopy at a single tertiary care referral center in the United States between January 1, 2007, and December 31, 2017, a 159-layer convolutional neural network (CNN) was constructed as a deep learning model to train and categorize images into 2 clinically relevant groups: remission (Mayo subscore 0 or 1) and moderate to severe disease (Mayo subscore, 2 or 3). Ninety percent of the cohort was used to build the model and 10% was used to test it; the process was repeated 10 times. A set of 30 full-motion colonoscopy videos, unseen by the model, was then used for external validation to mimic real-world application. MAIN OUTCOMES AND MEASURES: Model performance was assessed using area under the receiver operating curve (AUROC), sensitivity and specificity, positive predictive value (PPV), and negative predictive value (NPV). Kappa statistics (κ) were used to measure agreement of the CNN relative to adjudicated human reference cores. RESULTS: The authors included 16 514 images from 3082 unique patients (median [IQR] age, 41.3 [26.1-61.8] years, 1678 [54.4%] female), with 3980 images (24.1%) classified as moderate-to-severe disease by the adjudicated reference score. The CNN was excellent for distinguishing endoscopic remission from moderate-to-severe disease with an AUROC of 0.966 (95% CI, 0.967-0.972); a PPV of 0.87 (95% CI, 0.85-0.88) with a sensitivity of 83.0% (95% CI, 80.8%-85.4%) and specificty of 96.0% (95% CI, 95.1%-97.1%); and NPV of 0.94 (95% CI, 0.93-0.95). Weighted κ agreement between the CNN and the adjudicated reference score was also good for identifying exact Mayo subscores (κ = 0.84; 95% CI, 0.83-0.86) and was similar to the agreement between experienced reviewers (κ = 0.86; 95% CI, 0.85-0.87). Applying the CNN to entire colonoscopy videos had similar accuracy for identifying moderate to severe disease (AUROC, 0.97; 95% CI, 0.963-0.969). CONCLUSIONS AND RELEVANCE: This study found that deep learning model performance was similar to experienced human reviewers in grading endoscopic severity of UC. Given its scalability, this approach could improve the use of colonoscopy for UC in both research and routine practice. American Medical Association 2019-05-17 /pmc/articles/PMC6537821/ /pubmed/31099869 http://dx.doi.org/10.1001/jamanetworkopen.2019.3963 Text en Copyright 2019 Stidham RW et al. JAMA Network Open. http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the CC-BY License.
spellingShingle	Original Investigation Stidham, Ryan W. Liu, Wenshuo Bishu, Shrinivas Rice, Michael D. Higgins, Peter D. R. Zhu, Ji Nallamothu, Brahmajee K. Waljee, Akbar K. Performance of a Deep Learning Model vs Human Reviewers in Grading Endoscopic Disease Severity of Patients With Ulcerative Colitis
title	Performance of a Deep Learning Model vs Human Reviewers in Grading Endoscopic Disease Severity of Patients With Ulcerative Colitis
title_full	Performance of a Deep Learning Model vs Human Reviewers in Grading Endoscopic Disease Severity of Patients With Ulcerative Colitis
title_fullStr	Performance of a Deep Learning Model vs Human Reviewers in Grading Endoscopic Disease Severity of Patients With Ulcerative Colitis
title_full_unstemmed	Performance of a Deep Learning Model vs Human Reviewers in Grading Endoscopic Disease Severity of Patients With Ulcerative Colitis
title_short	Performance of a Deep Learning Model vs Human Reviewers in Grading Endoscopic Disease Severity of Patients With Ulcerative Colitis
title_sort	performance of a deep learning model vs human reviewers in grading endoscopic disease severity of patients with ulcerative colitis
topic	Original Investigation
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6537821/ https://www.ncbi.nlm.nih.gov/pubmed/31099869 http://dx.doi.org/10.1001/jamanetworkopen.2019.3963
work_keys_str_mv	AT stidhamryanw performanceofadeeplearningmodelvshumanreviewersingradingendoscopicdiseaseseverityofpatientswithulcerativecolitis AT liuwenshuo performanceofadeeplearningmodelvshumanreviewersingradingendoscopicdiseaseseverityofpatientswithulcerativecolitis AT bishushrinivas performanceofadeeplearningmodelvshumanreviewersingradingendoscopicdiseaseseverityofpatientswithulcerativecolitis AT ricemichaeld performanceofadeeplearningmodelvshumanreviewersingradingendoscopicdiseaseseverityofpatientswithulcerativecolitis AT higginspeterdr performanceofadeeplearningmodelvshumanreviewersingradingendoscopicdiseaseseverityofpatientswithulcerativecolitis AT zhuji performanceofadeeplearningmodelvshumanreviewersingradingendoscopicdiseaseseverityofpatientswithulcerativecolitis AT nallamothubrahmajeek performanceofadeeplearningmodelvshumanreviewersingradingendoscopicdiseaseseverityofpatientswithulcerativecolitis AT waljeeakbark performanceofadeeplearningmodelvshumanreviewersingradingendoscopicdiseaseseverityofpatientswithulcerativecolitis

Performance of a Deep Learning Model vs Human Reviewers in Grading Endoscopic Disease Severity of Patients With Ulcerative Colitis

Ejemplares similares