Cargando…

Application of an efficient Bayesian discretization method to biomedical data

BACKGROUND: Several data mining methods require data that are discrete, and other methods often perform better with discrete data. We introduce an efficient Bayesian discretization (EBD) method for optimal discretization of variables that runs efficiently on high-dimensional biomedical datasets. The...

Descripción completa

Detalles Bibliográficos
Autores principales: Lustgarten, Jonathan L, Visweswaran, Shyam, Gopalakrishnan, Vanathi, Cooper, Gregory F
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3162539/
https://www.ncbi.nlm.nih.gov/pubmed/21798039
http://dx.doi.org/10.1186/1471-2105-12-309
_version_ 1782210824690991104
author Lustgarten, Jonathan L
Visweswaran, Shyam
Gopalakrishnan, Vanathi
Cooper, Gregory F
author_facet Lustgarten, Jonathan L
Visweswaran, Shyam
Gopalakrishnan, Vanathi
Cooper, Gregory F
author_sort Lustgarten, Jonathan L
collection PubMed
description BACKGROUND: Several data mining methods require data that are discrete, and other methods often perform better with discrete data. We introduce an efficient Bayesian discretization (EBD) method for optimal discretization of variables that runs efficiently on high-dimensional biomedical datasets. The EBD method consists of two components, namely, a Bayesian score to evaluate discretizations and a dynamic programming search procedure to efficiently search the space of possible discretizations. We compared the performance of EBD to Fayyad and Irani's (FI) discretization method, which is commonly used for discretization. RESULTS: On 24 biomedical datasets obtained from high-throughput transcriptomic and proteomic studies, the classification performances of the C4.5 classifier and the naïve Bayes classifier were statistically significantly better when the predictor variables were discretized using EBD over FI. EBD was statistically significantly more stable to the variability of the datasets than FI. However, EBD was less robust, though not statistically significantly so, than FI and produced slightly more complex discretizations than FI. CONCLUSIONS: On a range of biomedical datasets, a Bayesian discretization method (EBD) yielded better classification performance and stability but was less robust than the widely used FI discretization method. The EBD discretization method is easy to implement, permits the incorporation of prior knowledge and belief, and is sufficiently fast for application to high-dimensional data.
format Online
Article
Text
id pubmed-3162539
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-31625392011-08-27 Application of an efficient Bayesian discretization method to biomedical data Lustgarten, Jonathan L Visweswaran, Shyam Gopalakrishnan, Vanathi Cooper, Gregory F BMC Bioinformatics Research Article BACKGROUND: Several data mining methods require data that are discrete, and other methods often perform better with discrete data. We introduce an efficient Bayesian discretization (EBD) method for optimal discretization of variables that runs efficiently on high-dimensional biomedical datasets. The EBD method consists of two components, namely, a Bayesian score to evaluate discretizations and a dynamic programming search procedure to efficiently search the space of possible discretizations. We compared the performance of EBD to Fayyad and Irani's (FI) discretization method, which is commonly used for discretization. RESULTS: On 24 biomedical datasets obtained from high-throughput transcriptomic and proteomic studies, the classification performances of the C4.5 classifier and the naïve Bayes classifier were statistically significantly better when the predictor variables were discretized using EBD over FI. EBD was statistically significantly more stable to the variability of the datasets than FI. However, EBD was less robust, though not statistically significantly so, than FI and produced slightly more complex discretizations than FI. CONCLUSIONS: On a range of biomedical datasets, a Bayesian discretization method (EBD) yielded better classification performance and stability but was less robust than the widely used FI discretization method. The EBD discretization method is easy to implement, permits the incorporation of prior knowledge and belief, and is sufficiently fast for application to high-dimensional data. BioMed Central 2011-07-28 /pmc/articles/PMC3162539/ /pubmed/21798039 http://dx.doi.org/10.1186/1471-2105-12-309 Text en Copyright ©2011 Lustgarten et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Lustgarten, Jonathan L
Visweswaran, Shyam
Gopalakrishnan, Vanathi
Cooper, Gregory F
Application of an efficient Bayesian discretization method to biomedical data
title Application of an efficient Bayesian discretization method to biomedical data
title_full Application of an efficient Bayesian discretization method to biomedical data
title_fullStr Application of an efficient Bayesian discretization method to biomedical data
title_full_unstemmed Application of an efficient Bayesian discretization method to biomedical data
title_short Application of an efficient Bayesian discretization method to biomedical data
title_sort application of an efficient bayesian discretization method to biomedical data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3162539/
https://www.ncbi.nlm.nih.gov/pubmed/21798039
http://dx.doi.org/10.1186/1471-2105-12-309
work_keys_str_mv AT lustgartenjonathanl applicationofanefficientbayesiandiscretizationmethodtobiomedicaldata
AT visweswaranshyam applicationofanefficientbayesiandiscretizationmethodtobiomedicaldata
AT gopalakrishnanvanathi applicationofanefficientbayesiandiscretizationmethodtobiomedicaldata
AT coopergregoryf applicationofanefficientbayesiandiscretizationmethodtobiomedicaldata