Cargando…
A Bayesian latent class extension of naive Bayesian classifier and its application to the classification of gastric cancer patients
BACKGROUND: The Naive Bayes (NB) classifier is a powerful supervised algorithm widely used in Machine Learning (ML). However, its effectiveness relies on a strict assumption of conditional independence, which is often violated in real-world scenarios. To address this limitation, various studies have...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10440900/ https://www.ncbi.nlm.nih.gov/pubmed/37605107 http://dx.doi.org/10.1186/s12874-023-02013-4 |
_version_ | 1785093252902289408 |
---|---|
author | Gohari, Kimiya Kazemnejad, Anoshirvan Mohammadi, Marjan Eskandari, Farzad Saberi, Samaneh Esmaieli, Maryam Sheidaei, Ali |
author_facet | Gohari, Kimiya Kazemnejad, Anoshirvan Mohammadi, Marjan Eskandari, Farzad Saberi, Samaneh Esmaieli, Maryam Sheidaei, Ali |
author_sort | Gohari, Kimiya |
collection | PubMed |
description | BACKGROUND: The Naive Bayes (NB) classifier is a powerful supervised algorithm widely used in Machine Learning (ML). However, its effectiveness relies on a strict assumption of conditional independence, which is often violated in real-world scenarios. To address this limitation, various studies have explored extensions of NB that tackle the issue of non-conditional independence in the data. These approaches can be broadly categorized into two main categories: feature selection and structure expansion. In this particular study, we propose a novel approach to enhancing NB by introducing a latent variable as the parent of the attributes. We define this latent variable using a flexible technique called Bayesian Latent Class Analysis (BLCA). As a result, our final model combines the strengths of NB and BLCA, giving rise to what we refer to as NB-BLCA. By incorporating the latent variable, we aim to capture complex dependencies among the attributes and improve the overall performance of the classifier. METHODS: Both Expectation-Maximization (EM) algorithm and the Gibbs sampling approach were offered for parameter learning. A simulation study was conducted to evaluate the classification of the model in comparison with the ordinary NB model. In addition, real-world data related to 976 Gastric Cancer (GC) and 1189 Non-ulcer dyspepsia (NUD) patients was used to show the model's performance in an actual application. The validity of models was evaluated using the 10-fold cross-validation. RESULTS: The presented model was superior to ordinary NB in all the simulation scenarios according to higher classification sensitivity and specificity in test data. The NB-BLCA model using Gibbs sampling accuracy was 87.77 (95% CI: 84.87-90.29). This index was estimated at 77.22 (95% CI: 73.64-80.53) and 74.71 (95% CI: 71.02-78.15) for the NB-BLCA model using the EM algorithm and ordinary NB classifier, respectively. CONCLUSIONS: When considering the modification of the NB classifier, incorporating a latent component into the model offers numerous advantages, particularly within medical and health-related contexts. By doing so, the researchers can bypass the extensive search algorithm and structure learning required in the local learning and structure extension approach. The inclusion of latent class variables allows for the integration of all attributes during model construction. Consequently, the NB-BLCA model serves as a suitable alternative to conventional NB classifiers when the assumption of independence is violated, especially in domains pertaining to health and medicine. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12874-023-02013-4. |
format | Online Article Text |
id | pubmed-10440900 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-104409002023-08-22 A Bayesian latent class extension of naive Bayesian classifier and its application to the classification of gastric cancer patients Gohari, Kimiya Kazemnejad, Anoshirvan Mohammadi, Marjan Eskandari, Farzad Saberi, Samaneh Esmaieli, Maryam Sheidaei, Ali BMC Med Res Methodol Research BACKGROUND: The Naive Bayes (NB) classifier is a powerful supervised algorithm widely used in Machine Learning (ML). However, its effectiveness relies on a strict assumption of conditional independence, which is often violated in real-world scenarios. To address this limitation, various studies have explored extensions of NB that tackle the issue of non-conditional independence in the data. These approaches can be broadly categorized into two main categories: feature selection and structure expansion. In this particular study, we propose a novel approach to enhancing NB by introducing a latent variable as the parent of the attributes. We define this latent variable using a flexible technique called Bayesian Latent Class Analysis (BLCA). As a result, our final model combines the strengths of NB and BLCA, giving rise to what we refer to as NB-BLCA. By incorporating the latent variable, we aim to capture complex dependencies among the attributes and improve the overall performance of the classifier. METHODS: Both Expectation-Maximization (EM) algorithm and the Gibbs sampling approach were offered for parameter learning. A simulation study was conducted to evaluate the classification of the model in comparison with the ordinary NB model. In addition, real-world data related to 976 Gastric Cancer (GC) and 1189 Non-ulcer dyspepsia (NUD) patients was used to show the model's performance in an actual application. The validity of models was evaluated using the 10-fold cross-validation. RESULTS: The presented model was superior to ordinary NB in all the simulation scenarios according to higher classification sensitivity and specificity in test data. The NB-BLCA model using Gibbs sampling accuracy was 87.77 (95% CI: 84.87-90.29). This index was estimated at 77.22 (95% CI: 73.64-80.53) and 74.71 (95% CI: 71.02-78.15) for the NB-BLCA model using the EM algorithm and ordinary NB classifier, respectively. CONCLUSIONS: When considering the modification of the NB classifier, incorporating a latent component into the model offers numerous advantages, particularly within medical and health-related contexts. By doing so, the researchers can bypass the extensive search algorithm and structure learning required in the local learning and structure extension approach. The inclusion of latent class variables allows for the integration of all attributes during model construction. Consequently, the NB-BLCA model serves as a suitable alternative to conventional NB classifiers when the assumption of independence is violated, especially in domains pertaining to health and medicine. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12874-023-02013-4. BioMed Central 2023-08-21 /pmc/articles/PMC10440900/ /pubmed/37605107 http://dx.doi.org/10.1186/s12874-023-02013-4 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Gohari, Kimiya Kazemnejad, Anoshirvan Mohammadi, Marjan Eskandari, Farzad Saberi, Samaneh Esmaieli, Maryam Sheidaei, Ali A Bayesian latent class extension of naive Bayesian classifier and its application to the classification of gastric cancer patients |
title | A Bayesian latent class extension of naive Bayesian classifier and its application to the classification of gastric cancer patients |
title_full | A Bayesian latent class extension of naive Bayesian classifier and its application to the classification of gastric cancer patients |
title_fullStr | A Bayesian latent class extension of naive Bayesian classifier and its application to the classification of gastric cancer patients |
title_full_unstemmed | A Bayesian latent class extension of naive Bayesian classifier and its application to the classification of gastric cancer patients |
title_short | A Bayesian latent class extension of naive Bayesian classifier and its application to the classification of gastric cancer patients |
title_sort | bayesian latent class extension of naive bayesian classifier and its application to the classification of gastric cancer patients |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10440900/ https://www.ncbi.nlm.nih.gov/pubmed/37605107 http://dx.doi.org/10.1186/s12874-023-02013-4 |
work_keys_str_mv | AT goharikimiya abayesianlatentclassextensionofnaivebayesianclassifieranditsapplicationtotheclassificationofgastriccancerpatients AT kazemnejadanoshirvan abayesianlatentclassextensionofnaivebayesianclassifieranditsapplicationtotheclassificationofgastriccancerpatients AT mohammadimarjan abayesianlatentclassextensionofnaivebayesianclassifieranditsapplicationtotheclassificationofgastriccancerpatients AT eskandarifarzad abayesianlatentclassextensionofnaivebayesianclassifieranditsapplicationtotheclassificationofgastriccancerpatients AT saberisamaneh abayesianlatentclassextensionofnaivebayesianclassifieranditsapplicationtotheclassificationofgastriccancerpatients AT esmaielimaryam abayesianlatentclassextensionofnaivebayesianclassifieranditsapplicationtotheclassificationofgastriccancerpatients AT sheidaeiali abayesianlatentclassextensionofnaivebayesianclassifieranditsapplicationtotheclassificationofgastriccancerpatients AT goharikimiya bayesianlatentclassextensionofnaivebayesianclassifieranditsapplicationtotheclassificationofgastriccancerpatients AT kazemnejadanoshirvan bayesianlatentclassextensionofnaivebayesianclassifieranditsapplicationtotheclassificationofgastriccancerpatients AT mohammadimarjan bayesianlatentclassextensionofnaivebayesianclassifieranditsapplicationtotheclassificationofgastriccancerpatients AT eskandarifarzad bayesianlatentclassextensionofnaivebayesianclassifieranditsapplicationtotheclassificationofgastriccancerpatients AT saberisamaneh bayesianlatentclassextensionofnaivebayesianclassifieranditsapplicationtotheclassificationofgastriccancerpatients AT esmaielimaryam bayesianlatentclassextensionofnaivebayesianclassifieranditsapplicationtotheclassificationofgastriccancerpatients AT sheidaeiali bayesianlatentclassextensionofnaivebayesianclassifieranditsapplicationtotheclassificationofgastriccancerpatients |