Cargando…

Preparing CT imaging datasets for deep learning in lung nodule analysis: Insights from four well-known datasets

Background: Deep learning is an important means to realize the automatic detection, segmentation, and classification of pulmonary nodules in computed tomography (CT) images. An entire CT scan cannot directly be used by deep learning models due to image size, image format, image dimensionality, and o...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Jingxuan, Sourlos, Nikos, Zheng, Sunyi, van der Velden, Nils, Pelgrim, Gert Jan, Vliegenthart, Rozemarijn, van Ooijen, Peter
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10361226/
https://www.ncbi.nlm.nih.gov/pubmed/37484314
http://dx.doi.org/10.1016/j.heliyon.2023.e17104
_version_ 1785076172164431872
author Wang, Jingxuan
Sourlos, Nikos
Zheng, Sunyi
van der Velden, Nils
Pelgrim, Gert Jan
Vliegenthart, Rozemarijn
van Ooijen, Peter
author_facet Wang, Jingxuan
Sourlos, Nikos
Zheng, Sunyi
van der Velden, Nils
Pelgrim, Gert Jan
Vliegenthart, Rozemarijn
van Ooijen, Peter
author_sort Wang, Jingxuan
collection PubMed
description Background: Deep learning is an important means to realize the automatic detection, segmentation, and classification of pulmonary nodules in computed tomography (CT) images. An entire CT scan cannot directly be used by deep learning models due to image size, image format, image dimensionality, and other factors. Between the acquisition of the CT scan and feeding the data into the deep learning model, there are several steps including data use permission, data access and download, data annotation, and data preprocessing. This paper aims to recommend a complete and detailed guide for researchers who want to engage in interdisciplinary lung nodule research of CT images and Artificial Intelligence (AI) engineering. Methods: The data preparation pipeline used the following four popular large-scale datasets: LIDC-IDRI (Lung Image Database Consortium image collection), LUNA16 (Lung Nodule Analysis 2016), NLST (National Lung Screening Trial) and NELSON (The Dutch-Belgian Randomized Lung Cancer Screening Trial). The dataset preparation is presented in chronological order. Findings: The different data preparation steps before deep learning were identified. These include both more generic steps and steps dedicated to lung nodule research. For each of these steps, the required process, necessity, and example code or tools for actual implementation are provided. Discussion and conclusion: Depending on the specific research question, researchers should be aware of the various preparation steps required and carefully select datasets, data annotation methods, and image preprocessing methods. Moreover, it is vital to acknowledge that each auxiliary tool or code has its specific scope of use and limitations. This paper proposes a standardized data preparation process while clearly demonstrating the principles and sequence of different steps. A data preparation pipeline can be quickly realized by following these proposed steps and implementing the suggested example codes and tools.
format Online
Article
Text
id pubmed-10361226
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-103612262023-07-22 Preparing CT imaging datasets for deep learning in lung nodule analysis: Insights from four well-known datasets Wang, Jingxuan Sourlos, Nikos Zheng, Sunyi van der Velden, Nils Pelgrim, Gert Jan Vliegenthart, Rozemarijn van Ooijen, Peter Heliyon Research Article Background: Deep learning is an important means to realize the automatic detection, segmentation, and classification of pulmonary nodules in computed tomography (CT) images. An entire CT scan cannot directly be used by deep learning models due to image size, image format, image dimensionality, and other factors. Between the acquisition of the CT scan and feeding the data into the deep learning model, there are several steps including data use permission, data access and download, data annotation, and data preprocessing. This paper aims to recommend a complete and detailed guide for researchers who want to engage in interdisciplinary lung nodule research of CT images and Artificial Intelligence (AI) engineering. Methods: The data preparation pipeline used the following four popular large-scale datasets: LIDC-IDRI (Lung Image Database Consortium image collection), LUNA16 (Lung Nodule Analysis 2016), NLST (National Lung Screening Trial) and NELSON (The Dutch-Belgian Randomized Lung Cancer Screening Trial). The dataset preparation is presented in chronological order. Findings: The different data preparation steps before deep learning were identified. These include both more generic steps and steps dedicated to lung nodule research. For each of these steps, the required process, necessity, and example code or tools for actual implementation are provided. Discussion and conclusion: Depending on the specific research question, researchers should be aware of the various preparation steps required and carefully select datasets, data annotation methods, and image preprocessing methods. Moreover, it is vital to acknowledge that each auxiliary tool or code has its specific scope of use and limitations. This paper proposes a standardized data preparation process while clearly demonstrating the principles and sequence of different steps. A data preparation pipeline can be quickly realized by following these proposed steps and implementing the suggested example codes and tools. Elsevier 2023-06-16 /pmc/articles/PMC10361226/ /pubmed/37484314 http://dx.doi.org/10.1016/j.heliyon.2023.e17104 Text en © 2023 The Authors https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Research Article
Wang, Jingxuan
Sourlos, Nikos
Zheng, Sunyi
van der Velden, Nils
Pelgrim, Gert Jan
Vliegenthart, Rozemarijn
van Ooijen, Peter
Preparing CT imaging datasets for deep learning in lung nodule analysis: Insights from four well-known datasets
title Preparing CT imaging datasets for deep learning in lung nodule analysis: Insights from four well-known datasets
title_full Preparing CT imaging datasets for deep learning in lung nodule analysis: Insights from four well-known datasets
title_fullStr Preparing CT imaging datasets for deep learning in lung nodule analysis: Insights from four well-known datasets
title_full_unstemmed Preparing CT imaging datasets for deep learning in lung nodule analysis: Insights from four well-known datasets
title_short Preparing CT imaging datasets for deep learning in lung nodule analysis: Insights from four well-known datasets
title_sort preparing ct imaging datasets for deep learning in lung nodule analysis: insights from four well-known datasets
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10361226/
https://www.ncbi.nlm.nih.gov/pubmed/37484314
http://dx.doi.org/10.1016/j.heliyon.2023.e17104
work_keys_str_mv AT wangjingxuan preparingctimagingdatasetsfordeeplearninginlungnoduleanalysisinsightsfromfourwellknowndatasets
AT sourlosnikos preparingctimagingdatasetsfordeeplearninginlungnoduleanalysisinsightsfromfourwellknowndatasets
AT zhengsunyi preparingctimagingdatasetsfordeeplearninginlungnoduleanalysisinsightsfromfourwellknowndatasets
AT vanderveldennils preparingctimagingdatasetsfordeeplearninginlungnoduleanalysisinsightsfromfourwellknowndatasets
AT pelgrimgertjan preparingctimagingdatasetsfordeeplearninginlungnoduleanalysisinsightsfromfourwellknowndatasets
AT vliegenthartrozemarijn preparingctimagingdatasetsfordeeplearninginlungnoduleanalysisinsightsfromfourwellknowndatasets
AT vanooijenpeter preparingctimagingdatasetsfordeeplearninginlungnoduleanalysisinsightsfromfourwellknowndatasets