Cargando…

Applying Data Mining Techniques to Extract Hidden Patterns about Breast Cancer Survival in an Iranian Cohort Study

Background: Breast cancer survival has been analyzed by many standard data mining algorithms. A group of these algorithms belonged to the decision tree category. Ability of the decision tree algorithms in terms of visualizing and formulating of hidden patterns among study variables were main reasons...

Descripción completa

Detalles Bibliográficos
Autores principales: Khalkhali, Hamid Reza, Lotfnezhad Afshar, Hadi, Esnaashar, Omid, Jabbari, Nasrollah
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hamadan University of Medical Sciences 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7189091/
https://www.ncbi.nlm.nih.gov/pubmed/27061994
_version_ 1783527437160677376
author Khalkhali, Hamid Reza
Lotfnezhad Afshar, Hadi
Esnaashar, Omid
Jabbari, Nasrollah
author_facet Khalkhali, Hamid Reza
Lotfnezhad Afshar, Hadi
Esnaashar, Omid
Jabbari, Nasrollah
author_sort Khalkhali, Hamid Reza
collection PubMed
description Background: Breast cancer survival has been analyzed by many standard data mining algorithms. A group of these algorithms belonged to the decision tree category. Ability of the decision tree algorithms in terms of visualizing and formulating of hidden patterns among study variables were main reasons to apply an algorithm from the decision tree category in the current study that has not studied already. Methods: The classification and regression trees (CART) was applied to a breast cancer database contained information on569 patients in 2007-2010. The measurement of Gini impurity used for categorical target variables was utilized. The classification error that is a function of tree size was measured by 10-fold cross-validation experiments. The performance of created model was evaluated by the criteria as accuracy, sensitivity and specificity. Results: The CART model produced a decision tree with 17 nodes, 9 of which were associated with a set of rules. The rules were meaningful clinically. They showed in the if-then format that Stage was the most important variable for predicting breast cancer survival. The scores of accuracy, sensitivity and specificity were: 80.3%, 93.5% and 53%, respectively. Conclusions: The current study model as the first one created by the CART was able to extract useful hidden rules from a relatively small size dataset.
format Online
Article
Text
id pubmed-7189091
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Hamadan University of Medical Sciences
record_format MEDLINE/PubMed
spelling pubmed-71890912020-05-11 Applying Data Mining Techniques to Extract Hidden Patterns about Breast Cancer Survival in an Iranian Cohort Study Khalkhali, Hamid Reza Lotfnezhad Afshar, Hadi Esnaashar, Omid Jabbari, Nasrollah J Res Health Sci Original Article Background: Breast cancer survival has been analyzed by many standard data mining algorithms. A group of these algorithms belonged to the decision tree category. Ability of the decision tree algorithms in terms of visualizing and formulating of hidden patterns among study variables were main reasons to apply an algorithm from the decision tree category in the current study that has not studied already. Methods: The classification and regression trees (CART) was applied to a breast cancer database contained information on569 patients in 2007-2010. The measurement of Gini impurity used for categorical target variables was utilized. The classification error that is a function of tree size was measured by 10-fold cross-validation experiments. The performance of created model was evaluated by the criteria as accuracy, sensitivity and specificity. Results: The CART model produced a decision tree with 17 nodes, 9 of which were associated with a set of rules. The rules were meaningful clinically. They showed in the if-then format that Stage was the most important variable for predicting breast cancer survival. The scores of accuracy, sensitivity and specificity were: 80.3%, 93.5% and 53%, respectively. Conclusions: The current study model as the first one created by the CART was able to extract useful hidden rules from a relatively small size dataset. Hamadan University of Medical Sciences 2015-03-21 /pmc/articles/PMC7189091/ /pubmed/27061994 Text en © 2016 The Author(s); Published by Hamadan University of Medical Sciences. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Article
Khalkhali, Hamid Reza
Lotfnezhad Afshar, Hadi
Esnaashar, Omid
Jabbari, Nasrollah
Applying Data Mining Techniques to Extract Hidden Patterns about Breast Cancer Survival in an Iranian Cohort Study
title Applying Data Mining Techniques to Extract Hidden Patterns about Breast Cancer Survival in an Iranian Cohort Study
title_full Applying Data Mining Techniques to Extract Hidden Patterns about Breast Cancer Survival in an Iranian Cohort Study
title_fullStr Applying Data Mining Techniques to Extract Hidden Patterns about Breast Cancer Survival in an Iranian Cohort Study
title_full_unstemmed Applying Data Mining Techniques to Extract Hidden Patterns about Breast Cancer Survival in an Iranian Cohort Study
title_short Applying Data Mining Techniques to Extract Hidden Patterns about Breast Cancer Survival in an Iranian Cohort Study
title_sort applying data mining techniques to extract hidden patterns about breast cancer survival in an iranian cohort study
topic Original Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7189091/
https://www.ncbi.nlm.nih.gov/pubmed/27061994
work_keys_str_mv AT khalkhalihamidreza applyingdataminingtechniquestoextracthiddenpatternsaboutbreastcancersurvivalinaniraniancohortstudy
AT lotfnezhadafsharhadi applyingdataminingtechniquestoextracthiddenpatternsaboutbreastcancersurvivalinaniraniancohortstudy
AT esnaasharomid applyingdataminingtechniquestoextracthiddenpatternsaboutbreastcancersurvivalinaniraniancohortstudy
AT jabbarinasrollah applyingdataminingtechniquestoextracthiddenpatternsaboutbreastcancersurvivalinaniraniancohortstudy