Cargando…

Machine Learning With K-Means Dimensional Reduction for Predicting Survival Outcomes in Patients With Breast Cancer

OBJECTIVE: Despite existing prognostic markers, breast cancer prognosis remains a difficult subject due to the complex relationships between many contributing factors and survival. This study seeks to integrate multiple clinicopathological and genomic factors with dimensional reduction across machin...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhao, Melissa, Tang, Yushi, Kim, Hyunkyung, Hasegawa, Kohei
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	SAGE Publications 2018
Materias:	Original Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6238199/ https://www.ncbi.nlm.nih.gov/pubmed/30455569 http://dx.doi.org/10.1177/1176935118810215

_version_	1783371326288822272
author	Zhao, Melissa Tang, Yushi Kim, Hyunkyung Hasegawa, Kohei
author_facet	Zhao, Melissa Tang, Yushi Kim, Hyunkyung Hasegawa, Kohei
author_sort	Zhao, Melissa
collection	PubMed
description	OBJECTIVE: Despite existing prognostic markers, breast cancer prognosis remains a difficult subject due to the complex relationships between many contributing factors and survival. This study seeks to integrate multiple clinicopathological and genomic factors with dimensional reduction across machine learning algorithms to compare survival predictions. METHODS: This is a secondary analysis of the data from a prospective cohort study of female patients with breast cancer enrolled in the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC). We constructed a series of predictive models: ensemble models (Gradient Boosting and Random Forest), support vector machine (SVM), and artificial neural networks (ANN) for 5-year survival based on clinicopathological and gene expression data after K-means clustering with K-nearest-neighbor (KNN) classification. Model performance was evaluated by receiver operating characteristic (ROC) curve, accuracy, and calibration slope (CS). Model stability was assessed over 10 random runs in terms of ROC, accuracy, CS, and variable importance. RESULTS: The analytic cohort is composed of 1874 patients with breast cancer. Overall, the median age was 62 years; the 5-year survival rate was 75%. ROC and accuracy were not significantly different between models (ROC and accuracy around 0.67 and 0.72 across models, respectively). However, ensemble methods resulted in better fit (CS) with stable measures of variable importance across 10 random training/validation splits. K-means clustering of gene expression profiles on training data points along with KNN classification of validation data points was a robust method of dimensional reduction. Furthermore, the gene expression cluster with the highest mortality risk was an influential factor in model prediction. CONCLUSIONS: Using machine learning methods to construct predictive models for 5-year survival in patients with breast cancer, we demonstrated discrimination ability across models with new insight into the stability and utility of dimensional reduction on genomic features in breast cancer survival prediction.
format	Online Article Text
id	pubmed-6238199
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	SAGE Publications
record_format	MEDLINE/PubMed
spelling	pubmed-62381992018-11-19 Machine Learning With K-Means Dimensional Reduction for Predicting Survival Outcomes in Patients With Breast Cancer Zhao, Melissa Tang, Yushi Kim, Hyunkyung Hasegawa, Kohei Cancer Inform Original Research OBJECTIVE: Despite existing prognostic markers, breast cancer prognosis remains a difficult subject due to the complex relationships between many contributing factors and survival. This study seeks to integrate multiple clinicopathological and genomic factors with dimensional reduction across machine learning algorithms to compare survival predictions. METHODS: This is a secondary analysis of the data from a prospective cohort study of female patients with breast cancer enrolled in the Molecular Taxonomy of Breast Cancer International Consortium (METABRIC). We constructed a series of predictive models: ensemble models (Gradient Boosting and Random Forest), support vector machine (SVM), and artificial neural networks (ANN) for 5-year survival based on clinicopathological and gene expression data after K-means clustering with K-nearest-neighbor (KNN) classification. Model performance was evaluated by receiver operating characteristic (ROC) curve, accuracy, and calibration slope (CS). Model stability was assessed over 10 random runs in terms of ROC, accuracy, CS, and variable importance. RESULTS: The analytic cohort is composed of 1874 patients with breast cancer. Overall, the median age was 62 years; the 5-year survival rate was 75%. ROC and accuracy were not significantly different between models (ROC and accuracy around 0.67 and 0.72 across models, respectively). However, ensemble methods resulted in better fit (CS) with stable measures of variable importance across 10 random training/validation splits. K-means clustering of gene expression profiles on training data points along with KNN classification of validation data points was a robust method of dimensional reduction. Furthermore, the gene expression cluster with the highest mortality risk was an influential factor in model prediction. CONCLUSIONS: Using machine learning methods to construct predictive models for 5-year survival in patients with breast cancer, we demonstrated discrimination ability across models with new insight into the stability and utility of dimensional reduction on genomic features in breast cancer survival prediction. SAGE Publications 2018-11-09 /pmc/articles/PMC6238199/ /pubmed/30455569 http://dx.doi.org/10.1177/1176935118810215 Text en © The Author(s) 2018 http://www.creativecommons.org/licenses/by-nc/4.0/ This article is distributed under the terms of the Creative Commons Attribution-NonCommercial 4.0 License (http://www.creativecommons.org/licenses/by-nc/4.0/) which permits non-commercial use, reproduction and distribution of the work without further permission provided the original work is attributed as specified on the SAGE and Open Access pages (https://us.sagepub.com/en-us/nam/open-access-at-sage).
spellingShingle	Original Research Zhao, Melissa Tang, Yushi Kim, Hyunkyung Hasegawa, Kohei Machine Learning With K-Means Dimensional Reduction for Predicting Survival Outcomes in Patients With Breast Cancer
title	Machine Learning With K-Means Dimensional Reduction for Predicting Survival Outcomes in Patients With Breast Cancer
title_full	Machine Learning With K-Means Dimensional Reduction for Predicting Survival Outcomes in Patients With Breast Cancer
title_fullStr	Machine Learning With K-Means Dimensional Reduction for Predicting Survival Outcomes in Patients With Breast Cancer
title_full_unstemmed	Machine Learning With K-Means Dimensional Reduction for Predicting Survival Outcomes in Patients With Breast Cancer
title_short	Machine Learning With K-Means Dimensional Reduction for Predicting Survival Outcomes in Patients With Breast Cancer
title_sort	machine learning with k-means dimensional reduction for predicting survival outcomes in patients with breast cancer
topic	Original Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6238199/ https://www.ncbi.nlm.nih.gov/pubmed/30455569 http://dx.doi.org/10.1177/1176935118810215
work_keys_str_mv	AT zhaomelissa machinelearningwithkmeansdimensionalreductionforpredictingsurvivaloutcomesinpatientswithbreastcancer AT tangyushi machinelearningwithkmeansdimensionalreductionforpredictingsurvivaloutcomesinpatientswithbreastcancer AT kimhyunkyung machinelearningwithkmeansdimensionalreductionforpredictingsurvivaloutcomesinpatientswithbreastcancer AT hasegawakohei machinelearningwithkmeansdimensionalreductionforpredictingsurvivaloutcomesinpatientswithbreastcancer

Machine Learning With K-Means Dimensional Reduction for Predicting Survival Outcomes in Patients With Breast Cancer

Ejemplares similares