Cargando…

Mining causal relationships among clinical variables for cancer diagnosis based on Bayesian analysis

BACKGROUND: Cancer is the second leading cause of death around the world after cardiovascular diseases. Over the past decades, various data mining studies have tried to predict the outcome of cancer. However, only a few reports describe the causal relationships among clinical variables or attributes...

Descripción completa

Detalles Bibliográficos
Autor principal: Wang, LiMin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4404584/
https://www.ncbi.nlm.nih.gov/pubmed/25901184
http://dx.doi.org/10.1186/s13040-015-0046-4
_version_ 1782367515841658880
author Wang, LiMin
author_facet Wang, LiMin
author_sort Wang, LiMin
collection PubMed
description BACKGROUND: Cancer is the second leading cause of death around the world after cardiovascular diseases. Over the past decades, various data mining studies have tried to predict the outcome of cancer. However, only a few reports describe the causal relationships among clinical variables or attributes, which may provide theoretical guidance for cancer diagnosis and therapy. Different restricted Bayesian classifiers have been used to discover information from numerous domains. This research work designed a novel Bayesian learning strategy to predict cause-specific death classes and proposed a graphical structure of key attributes to clarify the implicit relationships implicated in the data set. RESULTS: The working mechanisms of 3 classical restricted Bayesian classifiers, namely, NB, TAN and KDB, were analysed and summarised. To retain the properties of global optimisation and high-order dependency representation, the proposed learning algorithm, i.e., flexible K-dependence Bayesian network (FKBN), applies the greedy search of conditional mutual information space to identify the globally optimal ordering of the attributes and to allow the classifiers to be constructed at arbitrary points (values of K) along the attribute dependence spectrum. This method represents the relationships between different attributes by using a directed acyclic graph (DAG) model. A total of 12 data sets were selected from the SEER database and KRBM repository by 10-fold cross-validation for evaluation purposes. The findings revealed that the FKBN model outperformed NB, TAN and KDB. CONCLUSIONS: A Bayesian classifier can graphically describe the conditional dependency among attributes. The proposed algorithm offers a trade-off between probability estimation and network structure complexity. The direct and indirect relationships between the predictive attributes and class variable should be considered simultaneously to achieve global optimisation and high-order dependency representation. By analysing the DAG inferred from the breast cancer data set of the SEER database we divided the attributes into two subgroups, namely, key attributes that should be considered first for cancer diagnosis and those that are independent of each other but are closely related to key attributes. The statistical analysis results clarify some of the causal relationships implicated in the DAG.
format Online
Article
Text
id pubmed-4404584
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-44045842015-04-22 Mining causal relationships among clinical variables for cancer diagnosis based on Bayesian analysis Wang, LiMin BioData Min Research BACKGROUND: Cancer is the second leading cause of death around the world after cardiovascular diseases. Over the past decades, various data mining studies have tried to predict the outcome of cancer. However, only a few reports describe the causal relationships among clinical variables or attributes, which may provide theoretical guidance for cancer diagnosis and therapy. Different restricted Bayesian classifiers have been used to discover information from numerous domains. This research work designed a novel Bayesian learning strategy to predict cause-specific death classes and proposed a graphical structure of key attributes to clarify the implicit relationships implicated in the data set. RESULTS: The working mechanisms of 3 classical restricted Bayesian classifiers, namely, NB, TAN and KDB, were analysed and summarised. To retain the properties of global optimisation and high-order dependency representation, the proposed learning algorithm, i.e., flexible K-dependence Bayesian network (FKBN), applies the greedy search of conditional mutual information space to identify the globally optimal ordering of the attributes and to allow the classifiers to be constructed at arbitrary points (values of K) along the attribute dependence spectrum. This method represents the relationships between different attributes by using a directed acyclic graph (DAG) model. A total of 12 data sets were selected from the SEER database and KRBM repository by 10-fold cross-validation for evaluation purposes. The findings revealed that the FKBN model outperformed NB, TAN and KDB. CONCLUSIONS: A Bayesian classifier can graphically describe the conditional dependency among attributes. The proposed algorithm offers a trade-off between probability estimation and network structure complexity. The direct and indirect relationships between the predictive attributes and class variable should be considered simultaneously to achieve global optimisation and high-order dependency representation. By analysing the DAG inferred from the breast cancer data set of the SEER database we divided the attributes into two subgroups, namely, key attributes that should be considered first for cancer diagnosis and those that are independent of each other but are closely related to key attributes. The statistical analysis results clarify some of the causal relationships implicated in the DAG. BioMed Central 2015-04-16 /pmc/articles/PMC4404584/ /pubmed/25901184 http://dx.doi.org/10.1186/s13040-015-0046-4 Text en © Wang; licensee BioMed Central. 2015 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.
spellingShingle Research
Wang, LiMin
Mining causal relationships among clinical variables for cancer diagnosis based on Bayesian analysis
title Mining causal relationships among clinical variables for cancer diagnosis based on Bayesian analysis
title_full Mining causal relationships among clinical variables for cancer diagnosis based on Bayesian analysis
title_fullStr Mining causal relationships among clinical variables for cancer diagnosis based on Bayesian analysis
title_full_unstemmed Mining causal relationships among clinical variables for cancer diagnosis based on Bayesian analysis
title_short Mining causal relationships among clinical variables for cancer diagnosis based on Bayesian analysis
title_sort mining causal relationships among clinical variables for cancer diagnosis based on bayesian analysis
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4404584/
https://www.ncbi.nlm.nih.gov/pubmed/25901184
http://dx.doi.org/10.1186/s13040-015-0046-4
work_keys_str_mv AT wanglimin miningcausalrelationshipsamongclinicalvariablesforcancerdiagnosisbasedonbayesiananalysis