Cargando…

Integrating Somatic Mutations for Breast Cancer Survival Prediction Using Machine Learning Methods

Breast cancer is the most common malignancy in women, and because it has a high mortality rate, it is urgent to develop computational methods to increase the accuracy of breast cancer survival predictive models. Although multi-omics data such as gene expression have been extensively used in recent s...

Descripción completa

Detalles Bibliográficos
Autores principales: He, Zongzhen, Zhang, Junying, Yuan, Xiguo, Zhang, Yuanyuan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7848170/
https://www.ncbi.nlm.nih.gov/pubmed/33537063
http://dx.doi.org/10.3389/fgene.2020.632901
_version_ 1783645073540382720
author He, Zongzhen
Zhang, Junying
Yuan, Xiguo
Zhang, Yuanyuan
author_facet He, Zongzhen
Zhang, Junying
Yuan, Xiguo
Zhang, Yuanyuan
author_sort He, Zongzhen
collection PubMed
description Breast cancer is the most common malignancy in women, and because it has a high mortality rate, it is urgent to develop computational methods to increase the accuracy of breast cancer survival predictive models. Although multi-omics data such as gene expression have been extensively used in recent studies, the accurate prognosis of breast cancer remains a challenge. Somatic mutations are another important and promising data source for studying cancer development, and its effect on the prognosis of breast cancer remains to be further explored. Meanwhile, these omics datasets are high-dimensional and redundant. Therefore, we adopted multiple kernel learning (MKL) to efficiently integrate somatic mutation to currently molecular data including gene expression, copy number variation (CNV), methylation, and protein expression data for the prediction of breast cancer survival. Before integration, the maximum relevance minimum redundancy (mRMR) feature selection method was utilized to select features that present high relevance to survival and low redundancy among themselves for each type of data. The experimental results demonstrated that the proposed method achieved the most optimal performance and there was a remarkable improvement in the prediction performance when somatic mutations were included, indicating that somatic mutations are critical for improving breast cancer survival predictions. Moreover, mRMR was superior to other feature selection methods used in previous studies. Furthermore, MKL outperformed the other traditional classifiers in multi-omics data integration. Our analysis indicated that through employing promising omics data such as somatic mutations and harnessing the power of proper feature selection methods and effective integration frameworks, the breast cancer survival predictive accuracy can be further increased, thereby providing a more optimal clinical diagnosis and more effective treatment for breast cancer patients.
format Online
Article
Text
id pubmed-7848170
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-78481702021-02-02 Integrating Somatic Mutations for Breast Cancer Survival Prediction Using Machine Learning Methods He, Zongzhen Zhang, Junying Yuan, Xiguo Zhang, Yuanyuan Front Genet Genetics Breast cancer is the most common malignancy in women, and because it has a high mortality rate, it is urgent to develop computational methods to increase the accuracy of breast cancer survival predictive models. Although multi-omics data such as gene expression have been extensively used in recent studies, the accurate prognosis of breast cancer remains a challenge. Somatic mutations are another important and promising data source for studying cancer development, and its effect on the prognosis of breast cancer remains to be further explored. Meanwhile, these omics datasets are high-dimensional and redundant. Therefore, we adopted multiple kernel learning (MKL) to efficiently integrate somatic mutation to currently molecular data including gene expression, copy number variation (CNV), methylation, and protein expression data for the prediction of breast cancer survival. Before integration, the maximum relevance minimum redundancy (mRMR) feature selection method was utilized to select features that present high relevance to survival and low redundancy among themselves for each type of data. The experimental results demonstrated that the proposed method achieved the most optimal performance and there was a remarkable improvement in the prediction performance when somatic mutations were included, indicating that somatic mutations are critical for improving breast cancer survival predictions. Moreover, mRMR was superior to other feature selection methods used in previous studies. Furthermore, MKL outperformed the other traditional classifiers in multi-omics data integration. Our analysis indicated that through employing promising omics data such as somatic mutations and harnessing the power of proper feature selection methods and effective integration frameworks, the breast cancer survival predictive accuracy can be further increased, thereby providing a more optimal clinical diagnosis and more effective treatment for breast cancer patients. Frontiers Media S.A. 2021-01-18 /pmc/articles/PMC7848170/ /pubmed/33537063 http://dx.doi.org/10.3389/fgene.2020.632901 Text en Copyright © 2021 He, Zhang, Yuan and Zhang. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
He, Zongzhen
Zhang, Junying
Yuan, Xiguo
Zhang, Yuanyuan
Integrating Somatic Mutations for Breast Cancer Survival Prediction Using Machine Learning Methods
title Integrating Somatic Mutations for Breast Cancer Survival Prediction Using Machine Learning Methods
title_full Integrating Somatic Mutations for Breast Cancer Survival Prediction Using Machine Learning Methods
title_fullStr Integrating Somatic Mutations for Breast Cancer Survival Prediction Using Machine Learning Methods
title_full_unstemmed Integrating Somatic Mutations for Breast Cancer Survival Prediction Using Machine Learning Methods
title_short Integrating Somatic Mutations for Breast Cancer Survival Prediction Using Machine Learning Methods
title_sort integrating somatic mutations for breast cancer survival prediction using machine learning methods
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7848170/
https://www.ncbi.nlm.nih.gov/pubmed/33537063
http://dx.doi.org/10.3389/fgene.2020.632901
work_keys_str_mv AT hezongzhen integratingsomaticmutationsforbreastcancersurvivalpredictionusingmachinelearningmethods
AT zhangjunying integratingsomaticmutationsforbreastcancersurvivalpredictionusingmachinelearningmethods
AT yuanxiguo integratingsomaticmutationsforbreastcancersurvivalpredictionusingmachinelearningmethods
AT zhangyuanyuan integratingsomaticmutationsforbreastcancersurvivalpredictionusingmachinelearningmethods