Cargando…

Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists

BACKGROUND: Chest radiograph interpretation is critical for the detection of thoracic diseases, including tuberculosis and lung cancer, which affect millions of people worldwide each year. This time-consuming task typically requires expert radiologists to read the images, leading to fatigue-based di...

Descripción completa

Detalles Bibliográficos
Autores principales: Rajpurkar, Pranav, Irvin, Jeremy, Ball, Robyn L., Zhu, Kaylie, Yang, Brandon, Mehta, Hershel, Duan, Tony, Ding, Daisy, Bagul, Aarti, Langlotz, Curtis P., Patel, Bhavik N., Yeom, Kristen W., Shpanskaya, Katie, Blankenberg, Francis G., Seekins, Jayne, Amrhein, Timothy J., Mong, David A., Halabi, Safwan S., Zucker, Evan J., Ng, Andrew Y., Lungren, Matthew P.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6245676/
https://www.ncbi.nlm.nih.gov/pubmed/30457988
http://dx.doi.org/10.1371/journal.pmed.1002686
_version_ 1783372282311213056
author Rajpurkar, Pranav
Irvin, Jeremy
Ball, Robyn L.
Zhu, Kaylie
Yang, Brandon
Mehta, Hershel
Duan, Tony
Ding, Daisy
Bagul, Aarti
Langlotz, Curtis P.
Patel, Bhavik N.
Yeom, Kristen W.
Shpanskaya, Katie
Blankenberg, Francis G.
Seekins, Jayne
Amrhein, Timothy J.
Mong, David A.
Halabi, Safwan S.
Zucker, Evan J.
Ng, Andrew Y.
Lungren, Matthew P.
author_facet Rajpurkar, Pranav
Irvin, Jeremy
Ball, Robyn L.
Zhu, Kaylie
Yang, Brandon
Mehta, Hershel
Duan, Tony
Ding, Daisy
Bagul, Aarti
Langlotz, Curtis P.
Patel, Bhavik N.
Yeom, Kristen W.
Shpanskaya, Katie
Blankenberg, Francis G.
Seekins, Jayne
Amrhein, Timothy J.
Mong, David A.
Halabi, Safwan S.
Zucker, Evan J.
Ng, Andrew Y.
Lungren, Matthew P.
author_sort Rajpurkar, Pranav
collection PubMed
description BACKGROUND: Chest radiograph interpretation is critical for the detection of thoracic diseases, including tuberculosis and lung cancer, which affect millions of people worldwide each year. This time-consuming task typically requires expert radiologists to read the images, leading to fatigue-based diagnostic error and lack of diagnostic expertise in areas of the world where radiologists are not available. Recently, deep learning approaches have been able to achieve expert-level performance in medical image interpretation tasks, powered by large network architectures and fueled by the emergence of large labeled datasets. The purpose of this study is to investigate the performance of a deep learning algorithm on the detection of pathologies in chest radiographs compared with practicing radiologists. METHODS AND FINDINGS: We developed CheXNeXt, a convolutional neural network to concurrently detect the presence of 14 different pathologies, including pneumonia, pleural effusion, pulmonary masses, and nodules in frontal-view chest radiographs. CheXNeXt was trained and internally validated on the ChestX-ray8 dataset, with a held-out validation set consisting of 420 images, sampled to contain at least 50 cases of each of the original pathology labels. On this validation set, the majority vote of a panel of 3 board-certified cardiothoracic specialist radiologists served as reference standard. We compared CheXNeXt’s discriminative performance on the validation set to the performance of 9 radiologists using the area under the receiver operating characteristic curve (AUC). The radiologists included 6 board-certified radiologists (average experience 12 years, range 4–28 years) and 3 senior radiology residents, from 3 academic institutions. We found that CheXNeXt achieved radiologist-level performance on 11 pathologies and did not achieve radiologist-level performance on 3 pathologies. The radiologists achieved statistically significantly higher AUC performance on cardiomegaly, emphysema, and hiatal hernia, with AUCs of 0.888 (95% confidence interval [CI] 0.863–0.910), 0.911 (95% CI 0.866–0.947), and 0.985 (95% CI 0.974–0.991), respectively, whereas CheXNeXt’s AUCs were 0.831 (95% CI 0.790–0.870), 0.704 (95% CI 0.567–0.833), and 0.851 (95% CI 0.785–0.909), respectively. CheXNeXt performed better than radiologists in detecting atelectasis, with an AUC of 0.862 (95% CI 0.825–0.895), statistically significantly higher than radiologists' AUC of 0.808 (95% CI 0.777–0.838); there were no statistically significant differences in AUCs for the other 10 pathologies. The average time to interpret the 420 images in the validation set was substantially longer for the radiologists (240 minutes) than for CheXNeXt (1.5 minutes). The main limitations of our study are that neither CheXNeXt nor the radiologists were permitted to use patient history or review prior examinations and that evaluation was limited to a dataset from a single institution. CONCLUSIONS: In this study, we developed and validated a deep learning algorithm that classified clinically important abnormalities in chest radiographs at a performance level comparable to practicing radiologists. Once tested prospectively in clinical settings, the algorithm could have the potential to expand patient access to chest radiograph diagnostics.
format Online
Article
Text
id pubmed-6245676
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-62456762018-12-01 Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists Rajpurkar, Pranav Irvin, Jeremy Ball, Robyn L. Zhu, Kaylie Yang, Brandon Mehta, Hershel Duan, Tony Ding, Daisy Bagul, Aarti Langlotz, Curtis P. Patel, Bhavik N. Yeom, Kristen W. Shpanskaya, Katie Blankenberg, Francis G. Seekins, Jayne Amrhein, Timothy J. Mong, David A. Halabi, Safwan S. Zucker, Evan J. Ng, Andrew Y. Lungren, Matthew P. PLoS Med Research Article BACKGROUND: Chest radiograph interpretation is critical for the detection of thoracic diseases, including tuberculosis and lung cancer, which affect millions of people worldwide each year. This time-consuming task typically requires expert radiologists to read the images, leading to fatigue-based diagnostic error and lack of diagnostic expertise in areas of the world where radiologists are not available. Recently, deep learning approaches have been able to achieve expert-level performance in medical image interpretation tasks, powered by large network architectures and fueled by the emergence of large labeled datasets. The purpose of this study is to investigate the performance of a deep learning algorithm on the detection of pathologies in chest radiographs compared with practicing radiologists. METHODS AND FINDINGS: We developed CheXNeXt, a convolutional neural network to concurrently detect the presence of 14 different pathologies, including pneumonia, pleural effusion, pulmonary masses, and nodules in frontal-view chest radiographs. CheXNeXt was trained and internally validated on the ChestX-ray8 dataset, with a held-out validation set consisting of 420 images, sampled to contain at least 50 cases of each of the original pathology labels. On this validation set, the majority vote of a panel of 3 board-certified cardiothoracic specialist radiologists served as reference standard. We compared CheXNeXt’s discriminative performance on the validation set to the performance of 9 radiologists using the area under the receiver operating characteristic curve (AUC). The radiologists included 6 board-certified radiologists (average experience 12 years, range 4–28 years) and 3 senior radiology residents, from 3 academic institutions. We found that CheXNeXt achieved radiologist-level performance on 11 pathologies and did not achieve radiologist-level performance on 3 pathologies. The radiologists achieved statistically significantly higher AUC performance on cardiomegaly, emphysema, and hiatal hernia, with AUCs of 0.888 (95% confidence interval [CI] 0.863–0.910), 0.911 (95% CI 0.866–0.947), and 0.985 (95% CI 0.974–0.991), respectively, whereas CheXNeXt’s AUCs were 0.831 (95% CI 0.790–0.870), 0.704 (95% CI 0.567–0.833), and 0.851 (95% CI 0.785–0.909), respectively. CheXNeXt performed better than radiologists in detecting atelectasis, with an AUC of 0.862 (95% CI 0.825–0.895), statistically significantly higher than radiologists' AUC of 0.808 (95% CI 0.777–0.838); there were no statistically significant differences in AUCs for the other 10 pathologies. The average time to interpret the 420 images in the validation set was substantially longer for the radiologists (240 minutes) than for CheXNeXt (1.5 minutes). The main limitations of our study are that neither CheXNeXt nor the radiologists were permitted to use patient history or review prior examinations and that evaluation was limited to a dataset from a single institution. CONCLUSIONS: In this study, we developed and validated a deep learning algorithm that classified clinically important abnormalities in chest radiographs at a performance level comparable to practicing radiologists. Once tested prospectively in clinical settings, the algorithm could have the potential to expand patient access to chest radiograph diagnostics. Public Library of Science 2018-11-20 /pmc/articles/PMC6245676/ /pubmed/30457988 http://dx.doi.org/10.1371/journal.pmed.1002686 Text en © 2018 Rajpurkar et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Rajpurkar, Pranav
Irvin, Jeremy
Ball, Robyn L.
Zhu, Kaylie
Yang, Brandon
Mehta, Hershel
Duan, Tony
Ding, Daisy
Bagul, Aarti
Langlotz, Curtis P.
Patel, Bhavik N.
Yeom, Kristen W.
Shpanskaya, Katie
Blankenberg, Francis G.
Seekins, Jayne
Amrhein, Timothy J.
Mong, David A.
Halabi, Safwan S.
Zucker, Evan J.
Ng, Andrew Y.
Lungren, Matthew P.
Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists
title Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists
title_full Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists
title_fullStr Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists
title_full_unstemmed Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists
title_short Deep learning for chest radiograph diagnosis: A retrospective comparison of the CheXNeXt algorithm to practicing radiologists
title_sort deep learning for chest radiograph diagnosis: a retrospective comparison of the chexnext algorithm to practicing radiologists
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6245676/
https://www.ncbi.nlm.nih.gov/pubmed/30457988
http://dx.doi.org/10.1371/journal.pmed.1002686
work_keys_str_mv AT rajpurkarpranav deeplearningforchestradiographdiagnosisaretrospectivecomparisonofthechexnextalgorithmtopracticingradiologists
AT irvinjeremy deeplearningforchestradiographdiagnosisaretrospectivecomparisonofthechexnextalgorithmtopracticingradiologists
AT ballrobynl deeplearningforchestradiographdiagnosisaretrospectivecomparisonofthechexnextalgorithmtopracticingradiologists
AT zhukaylie deeplearningforchestradiographdiagnosisaretrospectivecomparisonofthechexnextalgorithmtopracticingradiologists
AT yangbrandon deeplearningforchestradiographdiagnosisaretrospectivecomparisonofthechexnextalgorithmtopracticingradiologists
AT mehtahershel deeplearningforchestradiographdiagnosisaretrospectivecomparisonofthechexnextalgorithmtopracticingradiologists
AT duantony deeplearningforchestradiographdiagnosisaretrospectivecomparisonofthechexnextalgorithmtopracticingradiologists
AT dingdaisy deeplearningforchestradiographdiagnosisaretrospectivecomparisonofthechexnextalgorithmtopracticingradiologists
AT bagulaarti deeplearningforchestradiographdiagnosisaretrospectivecomparisonofthechexnextalgorithmtopracticingradiologists
AT langlotzcurtisp deeplearningforchestradiographdiagnosisaretrospectivecomparisonofthechexnextalgorithmtopracticingradiologists
AT patelbhavikn deeplearningforchestradiographdiagnosisaretrospectivecomparisonofthechexnextalgorithmtopracticingradiologists
AT yeomkristenw deeplearningforchestradiographdiagnosisaretrospectivecomparisonofthechexnextalgorithmtopracticingradiologists
AT shpanskayakatie deeplearningforchestradiographdiagnosisaretrospectivecomparisonofthechexnextalgorithmtopracticingradiologists
AT blankenbergfrancisg deeplearningforchestradiographdiagnosisaretrospectivecomparisonofthechexnextalgorithmtopracticingradiologists
AT seekinsjayne deeplearningforchestradiographdiagnosisaretrospectivecomparisonofthechexnextalgorithmtopracticingradiologists
AT amrheintimothyj deeplearningforchestradiographdiagnosisaretrospectivecomparisonofthechexnextalgorithmtopracticingradiologists
AT mongdavida deeplearningforchestradiographdiagnosisaretrospectivecomparisonofthechexnextalgorithmtopracticingradiologists
AT halabisafwans deeplearningforchestradiographdiagnosisaretrospectivecomparisonofthechexnextalgorithmtopracticingradiologists
AT zuckerevanj deeplearningforchestradiographdiagnosisaretrospectivecomparisonofthechexnextalgorithmtopracticingradiologists
AT ngandrewy deeplearningforchestradiographdiagnosisaretrospectivecomparisonofthechexnextalgorithmtopracticingradiologists
AT lungrenmatthewp deeplearningforchestradiographdiagnosisaretrospectivecomparisonofthechexnextalgorithmtopracticingradiologists