Cargando…
Descriptive forest: experiments on a novel tree-structure-generalization method for describing cardiovascular diseases
BACKGROUND: A decision tree is a crucial tool for describing the factors related to cardiovascular disease (CVD) risk and for predicting and explaining it for patients. Notably, the decision tree must be simplified because patients may have different primary topics or factors related to the CVD risk...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10386781/ https://www.ncbi.nlm.nih.gov/pubmed/37507769 http://dx.doi.org/10.1186/s12911-023-02228-x |
_version_ | 1785081752453120000 |
---|---|
author | Liewlom, Peera |
author_facet | Liewlom, Peera |
author_sort | Liewlom, Peera |
collection | PubMed |
description | BACKGROUND: A decision tree is a crucial tool for describing the factors related to cardiovascular disease (CVD) risk and for predicting and explaining it for patients. Notably, the decision tree must be simplified because patients may have different primary topics or factors related to the CVD risk. Many decision trees can describe the data collected from multiple environmental heart disease risk datasets or a forest, where each tree describes the CVD risk for each primary topic. METHODS: We demonstrate the presence of trees, or a forest, using an integrated CVD dataset obtained from multiple datasets. Moreover, we apply a novel method to an association-rule tree to discover each primary topic hidden within a dataset. To generalize the tree structure for descriptive tasks, each primary topic is a boundary node acting as a root node of a C4.5 tree with the least prodigality for the tree structure (PTS). All trees are assigned to a descriptive forest describing the CVD risks in a dataset. A descriptive forest is used to describe each CVD patient’s primary risk topics and related factors. We describe eight primary topics in a descriptive forest acquired from 918 records of a heart failure–prediction dataset with 11 features obtained from five datasets. We apply the proposed method to 253,680 records with 22 features from imbalanced classes of a heart disease health–indicators dataset. RESULTS: The usability of the descriptive forest is demonstrated by a comparative study (on qualitative and quantitative tasks of the CVD-risk explanation) with a C4.5 tree generated from the same dataset but with the least PTS. The qualitative descriptive task confirms that compared to a single C4.5 tree, the descriptive forest is more flexible and can better describe the CVD risk, whereas the quantitative descriptive task confirms that it achieved higher coverage (recall) and correctness (accuracy and precision) and provided more detailed explanations. Additionally, for these tasks, the descriptive forest still outperforms the C4.5 tree. To reduce the problem of imbalanced classes, the ratio of classes in each subdataset generating each tree is investigated. CONCLUSION: The results provide confidence for using the descriptive forest. |
format | Online Article Text |
id | pubmed-10386781 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-103867812023-07-30 Descriptive forest: experiments on a novel tree-structure-generalization method for describing cardiovascular diseases Liewlom, Peera BMC Med Inform Decis Mak Research BACKGROUND: A decision tree is a crucial tool for describing the factors related to cardiovascular disease (CVD) risk and for predicting and explaining it for patients. Notably, the decision tree must be simplified because patients may have different primary topics or factors related to the CVD risk. Many decision trees can describe the data collected from multiple environmental heart disease risk datasets or a forest, where each tree describes the CVD risk for each primary topic. METHODS: We demonstrate the presence of trees, or a forest, using an integrated CVD dataset obtained from multiple datasets. Moreover, we apply a novel method to an association-rule tree to discover each primary topic hidden within a dataset. To generalize the tree structure for descriptive tasks, each primary topic is a boundary node acting as a root node of a C4.5 tree with the least prodigality for the tree structure (PTS). All trees are assigned to a descriptive forest describing the CVD risks in a dataset. A descriptive forest is used to describe each CVD patient’s primary risk topics and related factors. We describe eight primary topics in a descriptive forest acquired from 918 records of a heart failure–prediction dataset with 11 features obtained from five datasets. We apply the proposed method to 253,680 records with 22 features from imbalanced classes of a heart disease health–indicators dataset. RESULTS: The usability of the descriptive forest is demonstrated by a comparative study (on qualitative and quantitative tasks of the CVD-risk explanation) with a C4.5 tree generated from the same dataset but with the least PTS. The qualitative descriptive task confirms that compared to a single C4.5 tree, the descriptive forest is more flexible and can better describe the CVD risk, whereas the quantitative descriptive task confirms that it achieved higher coverage (recall) and correctness (accuracy and precision) and provided more detailed explanations. Additionally, for these tasks, the descriptive forest still outperforms the C4.5 tree. To reduce the problem of imbalanced classes, the ratio of classes in each subdataset generating each tree is investigated. CONCLUSION: The results provide confidence for using the descriptive forest. BioMed Central 2023-07-28 /pmc/articles/PMC10386781/ /pubmed/37507769 http://dx.doi.org/10.1186/s12911-023-02228-x Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Liewlom, Peera Descriptive forest: experiments on a novel tree-structure-generalization method for describing cardiovascular diseases |
title | Descriptive forest: experiments on a novel tree-structure-generalization method for describing cardiovascular diseases |
title_full | Descriptive forest: experiments on a novel tree-structure-generalization method for describing cardiovascular diseases |
title_fullStr | Descriptive forest: experiments on a novel tree-structure-generalization method for describing cardiovascular diseases |
title_full_unstemmed | Descriptive forest: experiments on a novel tree-structure-generalization method for describing cardiovascular diseases |
title_short | Descriptive forest: experiments on a novel tree-structure-generalization method for describing cardiovascular diseases |
title_sort | descriptive forest: experiments on a novel tree-structure-generalization method for describing cardiovascular diseases |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10386781/ https://www.ncbi.nlm.nih.gov/pubmed/37507769 http://dx.doi.org/10.1186/s12911-023-02228-x |
work_keys_str_mv | AT liewlompeera descriptiveforestexperimentsonanoveltreestructuregeneralizationmethodfordescribingcardiovasculardiseases |