Cargando…

Refining dataset curation methods for deep learning-based automated tuberculosis screening

BACKGROUND: The study objective was to determine whether unlabeled datasets can be used to further train and improve the accuracy of a deep learning system (DLS) for the detection of tuberculosis (TB) on chest radiographs (CXRs) using a two-stage semi-supervised approach. METHODS: A total of 111,622...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kim, Tae Kyung, Yi, Paul H., Hager, Gregory D., Lin, Cheng Ting
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	AME Publishing Company 2020
Materias:	Original Article on Role of Precision Imaging in Thoracic Disease
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7578485/ https://www.ncbi.nlm.nih.gov/pubmed/33145084 http://dx.doi.org/10.21037/jtd.2019.08.34

_version_	1783598376228487168
author	Kim, Tae Kyung Yi, Paul H. Hager, Gregory D. Lin, Cheng Ting
author_facet	Kim, Tae Kyung Yi, Paul H. Hager, Gregory D. Lin, Cheng Ting
author_sort	Kim, Tae Kyung
collection	PubMed
description	BACKGROUND: The study objective was to determine whether unlabeled datasets can be used to further train and improve the accuracy of a deep learning system (DLS) for the detection of tuberculosis (TB) on chest radiographs (CXRs) using a two-stage semi-supervised approach. METHODS: A total of 111,622 CXRs from the National Institute of Health ChestX-ray14 database were collected. A cardiothoracic radiologist reviewed a subset of 11,000 CXRs and dichotomously labeled each for the presence or absence of potential TB findings; these interpretations were used to train a deep convolutional neural network (DCNN) to identify CXRs with possible TB (Phase I). The best performing algorithm was then used to label the remaining database consisting of 100,622 radiographs; subsequently, these newly-labeled images were used to train a second DCNN (phase II). The best-performing algorithm from phase II (TBNet) was then tested against CXRs obtained from 3 separate sites (2 from the USA, 1 from China) with clinically confirmed cases of TB. Receiver operating characteristic (ROC) curves were generated with area under the curve (AUC) calculated. RESULTS: The phase I algorithm trained using 11,000 expert-labelled radiographs achieved an AUC of 0.88. The phase II algorithm trained on images labeled by the phase I algorithm achieved an AUC of 0.91 testing against a TB dataset obtained from Shenzhen, China and Montgomery County, USA. The algorithm generalized well to radiographs obtained from a tertiary care hospital, achieving an AUC of 0.87; TBNet’s sensitivity, specificity, positive predictive value, and negative predictive value were 85%, 76%, 0.64, and 0.9, respectively. When TBNet was used to arbitrate discrepancies between 2 radiologists, the overall sensitivity reached 94% and negative predictive value reached 0.96, demonstrating a synergistic effect between the algorithm’s output and radiologists’ interpretations. CONCLUSIONS: Using semi-supervised learning, we trained a deep learning algorithm that detected TB at a high accuracy and demonstrated value as a CAD tool by identifying relevant CXR findings, especially in cases that were misinterpreted by radiologists. When dataset labels are noisy or absent, the described methods can significantly reduce the required amount of curated data to build clinically-relevant deep learning models, which will play an important role in the era of precision medicine.
format	Online Article Text
id	pubmed-7578485
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	AME Publishing Company
record_format	MEDLINE/PubMed
spelling	pubmed-75784852020-11-02 Refining dataset curation methods for deep learning-based automated tuberculosis screening Kim, Tae Kyung Yi, Paul H. Hager, Gregory D. Lin, Cheng Ting J Thorac Dis Original Article on Role of Precision Imaging in Thoracic Disease BACKGROUND: The study objective was to determine whether unlabeled datasets can be used to further train and improve the accuracy of a deep learning system (DLS) for the detection of tuberculosis (TB) on chest radiographs (CXRs) using a two-stage semi-supervised approach. METHODS: A total of 111,622 CXRs from the National Institute of Health ChestX-ray14 database were collected. A cardiothoracic radiologist reviewed a subset of 11,000 CXRs and dichotomously labeled each for the presence or absence of potential TB findings; these interpretations were used to train a deep convolutional neural network (DCNN) to identify CXRs with possible TB (Phase I). The best performing algorithm was then used to label the remaining database consisting of 100,622 radiographs; subsequently, these newly-labeled images were used to train a second DCNN (phase II). The best-performing algorithm from phase II (TBNet) was then tested against CXRs obtained from 3 separate sites (2 from the USA, 1 from China) with clinically confirmed cases of TB. Receiver operating characteristic (ROC) curves were generated with area under the curve (AUC) calculated. RESULTS: The phase I algorithm trained using 11,000 expert-labelled radiographs achieved an AUC of 0.88. The phase II algorithm trained on images labeled by the phase I algorithm achieved an AUC of 0.91 testing against a TB dataset obtained from Shenzhen, China and Montgomery County, USA. The algorithm generalized well to radiographs obtained from a tertiary care hospital, achieving an AUC of 0.87; TBNet’s sensitivity, specificity, positive predictive value, and negative predictive value were 85%, 76%, 0.64, and 0.9, respectively. When TBNet was used to arbitrate discrepancies between 2 radiologists, the overall sensitivity reached 94% and negative predictive value reached 0.96, demonstrating a synergistic effect between the algorithm’s output and radiologists’ interpretations. CONCLUSIONS: Using semi-supervised learning, we trained a deep learning algorithm that detected TB at a high accuracy and demonstrated value as a CAD tool by identifying relevant CXR findings, especially in cases that were misinterpreted by radiologists. When dataset labels are noisy or absent, the described methods can significantly reduce the required amount of curated data to build clinically-relevant deep learning models, which will play an important role in the era of precision medicine. AME Publishing Company 2020-09 /pmc/articles/PMC7578485/ /pubmed/33145084 http://dx.doi.org/10.21037/jtd.2019.08.34 Text en 2020 Journal of Thoracic Disease. All rights reserved. https://creativecommons.org/licenses/by-nc-nd/4.0/Open Access Statement: This is an Open Access article distributed in accordance with the Creative Commons Attribution-NonCommercial-NoDerivs 4.0 International License (CC BY-NC-ND 4.0), which permits the non-commercial replication and distribution of the article with the strict proviso that no changes or edits are made and the original work is properly cited (including links to both the formal publication through the relevant DOI and the license). See: https://creativecommons.org/licenses/by-nc-nd/4.0 (https://creativecommons.org/licenses/by-nc-nd/4.0/) .
spellingShingle	Original Article on Role of Precision Imaging in Thoracic Disease Kim, Tae Kyung Yi, Paul H. Hager, Gregory D. Lin, Cheng Ting Refining dataset curation methods for deep learning-based automated tuberculosis screening
title	Refining dataset curation methods for deep learning-based automated tuberculosis screening
title_full	Refining dataset curation methods for deep learning-based automated tuberculosis screening
title_fullStr	Refining dataset curation methods for deep learning-based automated tuberculosis screening
title_full_unstemmed	Refining dataset curation methods for deep learning-based automated tuberculosis screening
title_short	Refining dataset curation methods for deep learning-based automated tuberculosis screening
title_sort	refining dataset curation methods for deep learning-based automated tuberculosis screening
topic	Original Article on Role of Precision Imaging in Thoracic Disease
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7578485/ https://www.ncbi.nlm.nih.gov/pubmed/33145084 http://dx.doi.org/10.21037/jtd.2019.08.34
work_keys_str_mv	AT kimtaekyung refiningdatasetcurationmethodsfordeeplearningbasedautomatedtuberculosisscreening AT yipaulh refiningdatasetcurationmethodsfordeeplearningbasedautomatedtuberculosisscreening AT hagergregoryd refiningdatasetcurationmethodsfordeeplearningbasedautomatedtuberculosisscreening AT linchengting refiningdatasetcurationmethodsfordeeplearningbasedautomatedtuberculosisscreening

Refining dataset curation methods for deep learning-based automated tuberculosis screening

Ejemplares similares