Cargando…

Mining causal relationships among clinical variables for cancer diagnosis based on Bayesian analysis

BACKGROUND: Cancer is the second leading cause of death around the world after cardiovascular diseases. Over the past decades, various data mining studies have tried to predict the outcome of cancer. However, only a few reports describe the causal relationships among clinical variables or attributes...

Descripción completa

Detalles Bibliográficos
Autor principal:	Wang, LiMin
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2015
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4404584/ https://www.ncbi.nlm.nih.gov/pubmed/25901184 http://dx.doi.org/10.1186/s13040-015-0046-4

_version_	1782367515841658880
author	Wang, LiMin
author_facet	Wang, LiMin
author_sort	Wang, LiMin
collection	PubMed
description	BACKGROUND: Cancer is the second leading cause of death around the world after cardiovascular diseases. Over the past decades, various data mining studies have tried to predict the outcome of cancer. However, only a few reports describe the causal relationships among clinical variables or attributes, which may provide theoretical guidance for cancer diagnosis and therapy. Different restricted Bayesian classifiers have been used to discover information from numerous domains. This research work designed a novel Bayesian learning strategy to predict cause-specific death classes and proposed a graphical structure of key attributes to clarify the implicit relationships implicated in the data set. RESULTS: The working mechanisms of 3 classical restricted Bayesian classifiers, namely, NB, TAN and KDB, were analysed and summarised. To retain the properties of global optimisation and high-order dependency representation, the proposed learning algorithm, i.e., flexible K-dependence Bayesian network (FKBN), applies the greedy search of conditional mutual information space to identify the globally optimal ordering of the attributes and to allow the classifiers to be constructed at arbitrary points (values of K) along the attribute dependence spectrum. This method represents the relationships between different attributes by using a directed acyclic graph (DAG) model. A total of 12 data sets were selected from the SEER database and KRBM repository by 10-fold cross-validation for evaluation purposes. The findings revealed that the FKBN model outperformed NB, TAN and KDB. CONCLUSIONS: A Bayesian classifier can graphically describe the conditional dependency among attributes. The proposed algorithm offers a trade-off between probability estimation and network structure complexity. The direct and indirect relationships between the predictive attributes and class variable should be considered simultaneously to achieve global optimisation and high-order dependency representation. By analysing the DAG inferred from the breast cancer data set of the SEER database we divided the attributes into two subgroups, namely, key attributes that should be considered first for cancer diagnosis and those that are independent of each other but are closely related to key attributes. The statistical analysis results clarify some of the causal relationships implicated in the DAG.
format	Online Article Text
id	pubmed-4404584
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-44045842015-04-22 Mining causal relationships among clinical variables for cancer diagnosis based on Bayesian analysis Wang, LiMin BioData Min Research BACKGROUND: Cancer is the second leading cause of death around the world after cardiovascular diseases. Over the past decades, various data mining studies have tried to predict the outcome of cancer. However, only a few reports describe the causal relationships among clinical variables or attributes, which may provide theoretical guidance for cancer diagnosis and therapy. Different restricted Bayesian classifiers have been used to discover information from numerous domains. This research work designed a novel Bayesian learning strategy to predict cause-specific death classes and proposed a graphical structure of key attributes to clarify the implicit relationships implicated in the data set. RESULTS: The working mechanisms of 3 classical restricted Bayesian classifiers, namely, NB, TAN and KDB, were analysed and summarised. To retain the properties of global optimisation and high-order dependency representation, the proposed learning algorithm, i.e., flexible K-dependence Bayesian network (FKBN), applies the greedy search of conditional mutual information space to identify the globally optimal ordering of the attributes and to allow the classifiers to be constructed at arbitrary points (values of K) along the attribute dependence spectrum. This method represents the relationships between different attributes by using a directed acyclic graph (DAG) model. A total of 12 data sets were selected from the SEER database and KRBM repository by 10-fold cross-validation for evaluation purposes. The findings revealed that the FKBN model outperformed NB, TAN and KDB. CONCLUSIONS: A Bayesian classifier can graphically describe the conditional dependency among attributes. The proposed algorithm offers a trade-off between probability estimation and network structure complexity. The direct and indirect relationships between the predictive attributes and class variable should be considered simultaneously to achieve global optimisation and high-order dependency representation. By analysing the DAG inferred from the breast cancer data set of the SEER database we divided the attributes into two subgroups, namely, key attributes that should be considered first for cancer diagnosis and those that are independent of each other but are closely related to key attributes. The statistical analysis results clarify some of the causal relationships implicated in the DAG. BioMed Central 2015-04-16 /pmc/articles/PMC4404584/ /pubmed/25901184 http://dx.doi.org/10.1186/s13040-015-0046-4 Text en © Wang; licensee BioMed Central. 2015 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.
spellingShingle	Research Wang, LiMin Mining causal relationships among clinical variables for cancer diagnosis based on Bayesian analysis
title	Mining causal relationships among clinical variables for cancer diagnosis based on Bayesian analysis
title_full	Mining causal relationships among clinical variables for cancer diagnosis based on Bayesian analysis
title_fullStr	Mining causal relationships among clinical variables for cancer diagnosis based on Bayesian analysis
title_full_unstemmed	Mining causal relationships among clinical variables for cancer diagnosis based on Bayesian analysis
title_short	Mining causal relationships among clinical variables for cancer diagnosis based on Bayesian analysis
title_sort	mining causal relationships among clinical variables for cancer diagnosis based on bayesian analysis
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4404584/ https://www.ncbi.nlm.nih.gov/pubmed/25901184 http://dx.doi.org/10.1186/s13040-015-0046-4
work_keys_str_mv	AT wanglimin miningcausalrelationshipsamongclinicalvariablesforcancerdiagnosisbasedonbayesiananalysis

Mining causal relationships among clinical variables for cancer diagnosis based on Bayesian analysis

Ejemplares similares