Cargando…
A united model for diagnosing pulmonary tuberculosis with random forest and artificial neural network
Background: Pulmonary tuberculosis (PTB) is a chronic infectious disease and is the most common type of TB. Although the sputum smear test is a gold standard for diagnosing PTB, the method has numerous limitations, including low sensitivity, low specificity, and insufficient samples. Methods: The pr...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10033863/ https://www.ncbi.nlm.nih.gov/pubmed/36968608 http://dx.doi.org/10.3389/fgene.2023.1094099 |
_version_ | 1784911080280031232 |
---|---|
author | Zhu, Qingqing Liu, Jie |
author_facet | Zhu, Qingqing Liu, Jie |
author_sort | Zhu, Qingqing |
collection | PubMed |
description | Background: Pulmonary tuberculosis (PTB) is a chronic infectious disease and is the most common type of TB. Although the sputum smear test is a gold standard for diagnosing PTB, the method has numerous limitations, including low sensitivity, low specificity, and insufficient samples. Methods: The present study aimed to identify specific biomarkers of PTB and construct a model for diagnosing PTB by combining random forest (RF) and artificial neural network (ANN) algorithms. Two publicly available cohorts of TB, namely, the GSE83456 (training) and GSE42834 (validation) cohorts, were retrieved from the Gene Expression Omnibus (GEO) database. A total of 45 and 61 differentially expressed genes (DEGs) were identified between the PTB and control samples, respectively, by screening the GSE83456 cohort. An RF classifier was used for identifying specific biomarkers, following which an ANN-based classification model was constructed for identifying PTB samples. The accuracy of the ANN model was validated using the receiver operating characteristic (ROC) curve. The proportion of 22 types of immunocytes in the PTB samples was measured using the CIBERSORT algorithm, and the correlations between the immunocytes were determined. Results: Differential analysis revealed that 11 and 22 DEGs were upregulated and downregulated, respectively, and 11 biomarkers specific to PTB were identified by the RF classifier. The weights of these biomarkers were determined and an ANN-based classification model was subsequently constructed. The model exhibited outstanding performance, as revealed by the area under the curve (AUC), which was 1.000 for the training cohort. The AUC of the validation cohort was 0.946, which further confirmed the accuracy of the model. Conclusion: Altogether, the present study successfully identified specific genetic biomarkers of PTB and constructed a highly accurate model for the diagnosis of PTB based on blood samples. The model developed herein can serve as a reliable reference for the early detection of PTB and provide novel perspectives into the pathogenesis of PTB. |
format | Online Article Text |
id | pubmed-10033863 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-100338632023-03-24 A united model for diagnosing pulmonary tuberculosis with random forest and artificial neural network Zhu, Qingqing Liu, Jie Front Genet Genetics Background: Pulmonary tuberculosis (PTB) is a chronic infectious disease and is the most common type of TB. Although the sputum smear test is a gold standard for diagnosing PTB, the method has numerous limitations, including low sensitivity, low specificity, and insufficient samples. Methods: The present study aimed to identify specific biomarkers of PTB and construct a model for diagnosing PTB by combining random forest (RF) and artificial neural network (ANN) algorithms. Two publicly available cohorts of TB, namely, the GSE83456 (training) and GSE42834 (validation) cohorts, were retrieved from the Gene Expression Omnibus (GEO) database. A total of 45 and 61 differentially expressed genes (DEGs) were identified between the PTB and control samples, respectively, by screening the GSE83456 cohort. An RF classifier was used for identifying specific biomarkers, following which an ANN-based classification model was constructed for identifying PTB samples. The accuracy of the ANN model was validated using the receiver operating characteristic (ROC) curve. The proportion of 22 types of immunocytes in the PTB samples was measured using the CIBERSORT algorithm, and the correlations between the immunocytes were determined. Results: Differential analysis revealed that 11 and 22 DEGs were upregulated and downregulated, respectively, and 11 biomarkers specific to PTB were identified by the RF classifier. The weights of these biomarkers were determined and an ANN-based classification model was subsequently constructed. The model exhibited outstanding performance, as revealed by the area under the curve (AUC), which was 1.000 for the training cohort. The AUC of the validation cohort was 0.946, which further confirmed the accuracy of the model. Conclusion: Altogether, the present study successfully identified specific genetic biomarkers of PTB and constructed a highly accurate model for the diagnosis of PTB based on blood samples. The model developed herein can serve as a reliable reference for the early detection of PTB and provide novel perspectives into the pathogenesis of PTB. Frontiers Media S.A. 2023-03-09 /pmc/articles/PMC10033863/ /pubmed/36968608 http://dx.doi.org/10.3389/fgene.2023.1094099 Text en Copyright © 2023 Zhu and Liu. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Genetics Zhu, Qingqing Liu, Jie A united model for diagnosing pulmonary tuberculosis with random forest and artificial neural network |
title | A united model for diagnosing pulmonary tuberculosis with random forest and artificial neural network |
title_full | A united model for diagnosing pulmonary tuberculosis with random forest and artificial neural network |
title_fullStr | A united model for diagnosing pulmonary tuberculosis with random forest and artificial neural network |
title_full_unstemmed | A united model for diagnosing pulmonary tuberculosis with random forest and artificial neural network |
title_short | A united model for diagnosing pulmonary tuberculosis with random forest and artificial neural network |
title_sort | united model for diagnosing pulmonary tuberculosis with random forest and artificial neural network |
topic | Genetics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10033863/ https://www.ncbi.nlm.nih.gov/pubmed/36968608 http://dx.doi.org/10.3389/fgene.2023.1094099 |
work_keys_str_mv | AT zhuqingqing aunitedmodelfordiagnosingpulmonarytuberculosiswithrandomforestandartificialneuralnetwork AT liujie aunitedmodelfordiagnosingpulmonarytuberculosiswithrandomforestandartificialneuralnetwork AT zhuqingqing unitedmodelfordiagnosingpulmonarytuberculosiswithrandomforestandartificialneuralnetwork AT liujie unitedmodelfordiagnosingpulmonarytuberculosiswithrandomforestandartificialneuralnetwork |