Cargando…

A united model for diagnosing pulmonary tuberculosis with random forest and artificial neural network

Background: Pulmonary tuberculosis (PTB) is a chronic infectious disease and is the most common type of TB. Although the sputum smear test is a gold standard for diagnosing PTB, the method has numerous limitations, including low sensitivity, low specificity, and insufficient samples. Methods: The pr...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhu, Qingqing, Liu, Jie
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10033863/
https://www.ncbi.nlm.nih.gov/pubmed/36968608
http://dx.doi.org/10.3389/fgene.2023.1094099
_version_ 1784911080280031232
author Zhu, Qingqing
Liu, Jie
author_facet Zhu, Qingqing
Liu, Jie
author_sort Zhu, Qingqing
collection PubMed
description Background: Pulmonary tuberculosis (PTB) is a chronic infectious disease and is the most common type of TB. Although the sputum smear test is a gold standard for diagnosing PTB, the method has numerous limitations, including low sensitivity, low specificity, and insufficient samples. Methods: The present study aimed to identify specific biomarkers of PTB and construct a model for diagnosing PTB by combining random forest (RF) and artificial neural network (ANN) algorithms. Two publicly available cohorts of TB, namely, the GSE83456 (training) and GSE42834 (validation) cohorts, were retrieved from the Gene Expression Omnibus (GEO) database. A total of 45 and 61 differentially expressed genes (DEGs) were identified between the PTB and control samples, respectively, by screening the GSE83456 cohort. An RF classifier was used for identifying specific biomarkers, following which an ANN-based classification model was constructed for identifying PTB samples. The accuracy of the ANN model was validated using the receiver operating characteristic (ROC) curve. The proportion of 22 types of immunocytes in the PTB samples was measured using the CIBERSORT algorithm, and the correlations between the immunocytes were determined. Results: Differential analysis revealed that 11 and 22 DEGs were upregulated and downregulated, respectively, and 11 biomarkers specific to PTB were identified by the RF classifier. The weights of these biomarkers were determined and an ANN-based classification model was subsequently constructed. The model exhibited outstanding performance, as revealed by the area under the curve (AUC), which was 1.000 for the training cohort. The AUC of the validation cohort was 0.946, which further confirmed the accuracy of the model. Conclusion: Altogether, the present study successfully identified specific genetic biomarkers of PTB and constructed a highly accurate model for the diagnosis of PTB based on blood samples. The model developed herein can serve as a reliable reference for the early detection of PTB and provide novel perspectives into the pathogenesis of PTB.
format Online
Article
Text
id pubmed-10033863
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-100338632023-03-24 A united model for diagnosing pulmonary tuberculosis with random forest and artificial neural network Zhu, Qingqing Liu, Jie Front Genet Genetics Background: Pulmonary tuberculosis (PTB) is a chronic infectious disease and is the most common type of TB. Although the sputum smear test is a gold standard for diagnosing PTB, the method has numerous limitations, including low sensitivity, low specificity, and insufficient samples. Methods: The present study aimed to identify specific biomarkers of PTB and construct a model for diagnosing PTB by combining random forest (RF) and artificial neural network (ANN) algorithms. Two publicly available cohorts of TB, namely, the GSE83456 (training) and GSE42834 (validation) cohorts, were retrieved from the Gene Expression Omnibus (GEO) database. A total of 45 and 61 differentially expressed genes (DEGs) were identified between the PTB and control samples, respectively, by screening the GSE83456 cohort. An RF classifier was used for identifying specific biomarkers, following which an ANN-based classification model was constructed for identifying PTB samples. The accuracy of the ANN model was validated using the receiver operating characteristic (ROC) curve. The proportion of 22 types of immunocytes in the PTB samples was measured using the CIBERSORT algorithm, and the correlations between the immunocytes were determined. Results: Differential analysis revealed that 11 and 22 DEGs were upregulated and downregulated, respectively, and 11 biomarkers specific to PTB were identified by the RF classifier. The weights of these biomarkers were determined and an ANN-based classification model was subsequently constructed. The model exhibited outstanding performance, as revealed by the area under the curve (AUC), which was 1.000 for the training cohort. The AUC of the validation cohort was 0.946, which further confirmed the accuracy of the model. Conclusion: Altogether, the present study successfully identified specific genetic biomarkers of PTB and constructed a highly accurate model for the diagnosis of PTB based on blood samples. The model developed herein can serve as a reliable reference for the early detection of PTB and provide novel perspectives into the pathogenesis of PTB. Frontiers Media S.A. 2023-03-09 /pmc/articles/PMC10033863/ /pubmed/36968608 http://dx.doi.org/10.3389/fgene.2023.1094099 Text en Copyright © 2023 Zhu and Liu. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Zhu, Qingqing
Liu, Jie
A united model for diagnosing pulmonary tuberculosis with random forest and artificial neural network
title A united model for diagnosing pulmonary tuberculosis with random forest and artificial neural network
title_full A united model for diagnosing pulmonary tuberculosis with random forest and artificial neural network
title_fullStr A united model for diagnosing pulmonary tuberculosis with random forest and artificial neural network
title_full_unstemmed A united model for diagnosing pulmonary tuberculosis with random forest and artificial neural network
title_short A united model for diagnosing pulmonary tuberculosis with random forest and artificial neural network
title_sort united model for diagnosing pulmonary tuberculosis with random forest and artificial neural network
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10033863/
https://www.ncbi.nlm.nih.gov/pubmed/36968608
http://dx.doi.org/10.3389/fgene.2023.1094099
work_keys_str_mv AT zhuqingqing aunitedmodelfordiagnosingpulmonarytuberculosiswithrandomforestandartificialneuralnetwork
AT liujie aunitedmodelfordiagnosingpulmonarytuberculosiswithrandomforestandartificialneuralnetwork
AT zhuqingqing unitedmodelfordiagnosingpulmonarytuberculosiswithrandomforestandartificialneuralnetwork
AT liujie unitedmodelfordiagnosingpulmonarytuberculosiswithrandomforestandartificialneuralnetwork