Cargando…

Cancer Metastasis Prediction and Genomic Biomarker Identification through Machine Learning and eXplainable Artificial Intelligence in Breast Cancer Research

Aim: Method: This research presents a model combining machine learning (ML) techniques and eXplainable artificial intelligence (XAI) to predict breast cancer (BC) metastasis and reveal important genomic biomarkers in metastasis patients. Method: A total of 98 primary BC samples was analyzed, compris...

Descripción completa

Detalles Bibliográficos
Autores principales: Yagin, Burak, Yagin, Fatma Hilal, Colak, Cemil, Inceoglu, Feyza, Kadry, Seifedine, Kim, Jungeun
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10650093/
https://www.ncbi.nlm.nih.gov/pubmed/37958210
http://dx.doi.org/10.3390/diagnostics13213314
_version_ 1785135700776058880
author Yagin, Burak
Yagin, Fatma Hilal
Colak, Cemil
Inceoglu, Feyza
Kadry, Seifedine
Kim, Jungeun
author_facet Yagin, Burak
Yagin, Fatma Hilal
Colak, Cemil
Inceoglu, Feyza
Kadry, Seifedine
Kim, Jungeun
author_sort Yagin, Burak
collection PubMed
description Aim: Method: This research presents a model combining machine learning (ML) techniques and eXplainable artificial intelligence (XAI) to predict breast cancer (BC) metastasis and reveal important genomic biomarkers in metastasis patients. Method: A total of 98 primary BC samples was analyzed, comprising 34 samples from patients who developed distant metastases within a 5-year follow-up period and 44 samples from patients who remained disease-free for at least 5 years after diagnosis. Genomic data were then subjected to biostatistical analysis, followed by the application of the elastic net feature selection method. This technique identified a restricted number of genomic biomarkers associated with BC metastasis. A light gradient boosting machine (LightGBM), categorical boosting (CatBoost), Extreme Gradient Boosting (XGBoost), Gradient Boosting Trees (GBT), and Ada boosting (AdaBoost) algorithms were utilized for prediction. To assess the models’ predictive abilities, the accuracy, F1 score, precision, recall, area under the ROC curve (AUC), and Brier score were calculated as performance evaluation metrics. To promote interpretability and overcome the “black box” problem of ML models, a SHapley Additive exPlanations (SHAP) method was employed. Results: The LightGBM model outperformed other models, yielding remarkable accuracy of 96% and an AUC of 99.3%. In addition to biostatistical evaluation, in XAI-based SHAP results, increased expression levels of TSPYL5, ATP5E, CA9, NUP210, SLC37A1, ARIH1, PSMD7, UBQLN1, PRAME, and UBE2T (p ≤ 0.05) were found to be associated with an increased incidence of BC metastasis. Finally, decreased levels of expression of CACTIN, TGFB3, SCUBE2, ARL4D, OR1F1, ALDH4A1, PHF1, and CROCC (p ≤ 0.05) genes were also determined to increase the risk of metastasis in BC. Conclusion: The findings of this study may prevent disease progression and metastases and potentially improve clinical outcomes by recommending customized treatment approaches for BC patients.
format Online
Article
Text
id pubmed-10650093
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-106500932023-10-26 Cancer Metastasis Prediction and Genomic Biomarker Identification through Machine Learning and eXplainable Artificial Intelligence in Breast Cancer Research Yagin, Burak Yagin, Fatma Hilal Colak, Cemil Inceoglu, Feyza Kadry, Seifedine Kim, Jungeun Diagnostics (Basel) Article Aim: Method: This research presents a model combining machine learning (ML) techniques and eXplainable artificial intelligence (XAI) to predict breast cancer (BC) metastasis and reveal important genomic biomarkers in metastasis patients. Method: A total of 98 primary BC samples was analyzed, comprising 34 samples from patients who developed distant metastases within a 5-year follow-up period and 44 samples from patients who remained disease-free for at least 5 years after diagnosis. Genomic data were then subjected to biostatistical analysis, followed by the application of the elastic net feature selection method. This technique identified a restricted number of genomic biomarkers associated with BC metastasis. A light gradient boosting machine (LightGBM), categorical boosting (CatBoost), Extreme Gradient Boosting (XGBoost), Gradient Boosting Trees (GBT), and Ada boosting (AdaBoost) algorithms were utilized for prediction. To assess the models’ predictive abilities, the accuracy, F1 score, precision, recall, area under the ROC curve (AUC), and Brier score were calculated as performance evaluation metrics. To promote interpretability and overcome the “black box” problem of ML models, a SHapley Additive exPlanations (SHAP) method was employed. Results: The LightGBM model outperformed other models, yielding remarkable accuracy of 96% and an AUC of 99.3%. In addition to biostatistical evaluation, in XAI-based SHAP results, increased expression levels of TSPYL5, ATP5E, CA9, NUP210, SLC37A1, ARIH1, PSMD7, UBQLN1, PRAME, and UBE2T (p ≤ 0.05) were found to be associated with an increased incidence of BC metastasis. Finally, decreased levels of expression of CACTIN, TGFB3, SCUBE2, ARL4D, OR1F1, ALDH4A1, PHF1, and CROCC (p ≤ 0.05) genes were also determined to increase the risk of metastasis in BC. Conclusion: The findings of this study may prevent disease progression and metastases and potentially improve clinical outcomes by recommending customized treatment approaches for BC patients. MDPI 2023-10-26 /pmc/articles/PMC10650093/ /pubmed/37958210 http://dx.doi.org/10.3390/diagnostics13213314 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Yagin, Burak
Yagin, Fatma Hilal
Colak, Cemil
Inceoglu, Feyza
Kadry, Seifedine
Kim, Jungeun
Cancer Metastasis Prediction and Genomic Biomarker Identification through Machine Learning and eXplainable Artificial Intelligence in Breast Cancer Research
title Cancer Metastasis Prediction and Genomic Biomarker Identification through Machine Learning and eXplainable Artificial Intelligence in Breast Cancer Research
title_full Cancer Metastasis Prediction and Genomic Biomarker Identification through Machine Learning and eXplainable Artificial Intelligence in Breast Cancer Research
title_fullStr Cancer Metastasis Prediction and Genomic Biomarker Identification through Machine Learning and eXplainable Artificial Intelligence in Breast Cancer Research
title_full_unstemmed Cancer Metastasis Prediction and Genomic Biomarker Identification through Machine Learning and eXplainable Artificial Intelligence in Breast Cancer Research
title_short Cancer Metastasis Prediction and Genomic Biomarker Identification through Machine Learning and eXplainable Artificial Intelligence in Breast Cancer Research
title_sort cancer metastasis prediction and genomic biomarker identification through machine learning and explainable artificial intelligence in breast cancer research
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10650093/
https://www.ncbi.nlm.nih.gov/pubmed/37958210
http://dx.doi.org/10.3390/diagnostics13213314
work_keys_str_mv AT yaginburak cancermetastasispredictionandgenomicbiomarkeridentificationthroughmachinelearningandexplainableartificialintelligenceinbreastcancerresearch
AT yaginfatmahilal cancermetastasispredictionandgenomicbiomarkeridentificationthroughmachinelearningandexplainableartificialintelligenceinbreastcancerresearch
AT colakcemil cancermetastasispredictionandgenomicbiomarkeridentificationthroughmachinelearningandexplainableartificialintelligenceinbreastcancerresearch
AT inceoglufeyza cancermetastasispredictionandgenomicbiomarkeridentificationthroughmachinelearningandexplainableartificialintelligenceinbreastcancerresearch
AT kadryseifedine cancermetastasispredictionandgenomicbiomarkeridentificationthroughmachinelearningandexplainableartificialintelligenceinbreastcancerresearch
AT kimjungeun cancermetastasispredictionandgenomicbiomarkeridentificationthroughmachinelearningandexplainableartificialintelligenceinbreastcancerresearch