Cargando…

Predicting factors for survival of breast cancer patients using machine learning techniques

BACKGROUND: Breast cancer is one of the most common diseases in women worldwide. Many studies have been conducted to predict the survival indicators, however most of these analyses were predominantly performed using basic statistical methods. As an alternative, this study used machine learning techn...

Descripción completa

Detalles Bibliográficos
Autores principales: Ganggayah, Mogana Darshini, Taib, Nur Aishah, Har, Yip Cheng, Lio, Pietro, Dhillon, Sarinder Kaur
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6431077/
https://www.ncbi.nlm.nih.gov/pubmed/30902088
http://dx.doi.org/10.1186/s12911-019-0801-4
_version_ 1783405881842466816
author Ganggayah, Mogana Darshini
Taib, Nur Aishah
Har, Yip Cheng
Lio, Pietro
Dhillon, Sarinder Kaur
author_facet Ganggayah, Mogana Darshini
Taib, Nur Aishah
Har, Yip Cheng
Lio, Pietro
Dhillon, Sarinder Kaur
author_sort Ganggayah, Mogana Darshini
collection PubMed
description BACKGROUND: Breast cancer is one of the most common diseases in women worldwide. Many studies have been conducted to predict the survival indicators, however most of these analyses were predominantly performed using basic statistical methods. As an alternative, this study used machine learning techniques to build models for detecting and visualising significant prognostic indicators of breast cancer survival rate. METHODS: A large hospital-based breast cancer dataset retrieved from the University Malaya Medical Centre, Kuala Lumpur, Malaysia (n = 8066) with diagnosis information between 1993 and 2016 was used in this study. The dataset contained 23 predictor variables and one dependent variable, which referred to the survival status of the patients (alive or dead). In determining the significant prognostic factors of breast cancer survival rate, prediction models were built using decision tree, random forest, neural networks, extreme boost, logistic regression, and support vector machine. Next, the dataset was clustered based on the receptor status of breast cancer patients identified via immunohistochemistry to perform advanced modelling using random forest. Subsequently, the important variables were ranked via variable selection methods in random forest. Finally, decision trees were built and validation was performed using survival analysis. RESULTS: In terms of both model accuracy and calibration measure, all algorithms produced close outcomes, with the lowest obtained from decision tree (accuracy = 79.8%) and the highest from random forest (accuracy = 82.7%). The important variables identified in this study were cancer stage classification, tumour size, number of total axillary lymph nodes removed, number of positive lymph nodes, types of primary treatment, and methods of diagnosis. CONCLUSION: Interestingly the various machine learning algorithms used in this study yielded close accuracy hence these methods could be used as alternative predictive tools in the breast cancer survival studies, particularly in the Asian region. The important prognostic factors influencing survival rate of breast cancer identified in this study, which were validated by survival curves, are useful and could be translated into decision support tools in the medical domain.
format Online
Article
Text
id pubmed-6431077
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-64310772019-04-04 Predicting factors for survival of breast cancer patients using machine learning techniques Ganggayah, Mogana Darshini Taib, Nur Aishah Har, Yip Cheng Lio, Pietro Dhillon, Sarinder Kaur BMC Med Inform Decis Mak Research Article BACKGROUND: Breast cancer is one of the most common diseases in women worldwide. Many studies have been conducted to predict the survival indicators, however most of these analyses were predominantly performed using basic statistical methods. As an alternative, this study used machine learning techniques to build models for detecting and visualising significant prognostic indicators of breast cancer survival rate. METHODS: A large hospital-based breast cancer dataset retrieved from the University Malaya Medical Centre, Kuala Lumpur, Malaysia (n = 8066) with diagnosis information between 1993 and 2016 was used in this study. The dataset contained 23 predictor variables and one dependent variable, which referred to the survival status of the patients (alive or dead). In determining the significant prognostic factors of breast cancer survival rate, prediction models were built using decision tree, random forest, neural networks, extreme boost, logistic regression, and support vector machine. Next, the dataset was clustered based on the receptor status of breast cancer patients identified via immunohistochemistry to perform advanced modelling using random forest. Subsequently, the important variables were ranked via variable selection methods in random forest. Finally, decision trees were built and validation was performed using survival analysis. RESULTS: In terms of both model accuracy and calibration measure, all algorithms produced close outcomes, with the lowest obtained from decision tree (accuracy = 79.8%) and the highest from random forest (accuracy = 82.7%). The important variables identified in this study were cancer stage classification, tumour size, number of total axillary lymph nodes removed, number of positive lymph nodes, types of primary treatment, and methods of diagnosis. CONCLUSION: Interestingly the various machine learning algorithms used in this study yielded close accuracy hence these methods could be used as alternative predictive tools in the breast cancer survival studies, particularly in the Asian region. The important prognostic factors influencing survival rate of breast cancer identified in this study, which were validated by survival curves, are useful and could be translated into decision support tools in the medical domain. BioMed Central 2019-03-22 /pmc/articles/PMC6431077/ /pubmed/30902088 http://dx.doi.org/10.1186/s12911-019-0801-4 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Ganggayah, Mogana Darshini
Taib, Nur Aishah
Har, Yip Cheng
Lio, Pietro
Dhillon, Sarinder Kaur
Predicting factors for survival of breast cancer patients using machine learning techniques
title Predicting factors for survival of breast cancer patients using machine learning techniques
title_full Predicting factors for survival of breast cancer patients using machine learning techniques
title_fullStr Predicting factors for survival of breast cancer patients using machine learning techniques
title_full_unstemmed Predicting factors for survival of breast cancer patients using machine learning techniques
title_short Predicting factors for survival of breast cancer patients using machine learning techniques
title_sort predicting factors for survival of breast cancer patients using machine learning techniques
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6431077/
https://www.ncbi.nlm.nih.gov/pubmed/30902088
http://dx.doi.org/10.1186/s12911-019-0801-4
work_keys_str_mv AT ganggayahmoganadarshini predictingfactorsforsurvivalofbreastcancerpatientsusingmachinelearningtechniques
AT taibnuraishah predictingfactorsforsurvivalofbreastcancerpatientsusingmachinelearningtechniques
AT haryipcheng predictingfactorsforsurvivalofbreastcancerpatientsusingmachinelearningtechniques
AT liopietro predictingfactorsforsurvivalofbreastcancerpatientsusingmachinelearningtechniques
AT dhillonsarinderkaur predictingfactorsforsurvivalofbreastcancerpatientsusingmachinelearningtechniques