Cargando…

A Robust Personalized Classification Method for Breast Cancer Metastasis Prediction

SIMPLE SUMMARY: Accurate prediction of breast cancer metastasis risks using gene expression data and machine learning can help improve cancer treatment and overall survival. However, breast cancer can be categorized into multiple subtypes, and a single predictive model may not work well for all pati...

Descripción completa

Detalles Bibliográficos
Autores principales: Adnan, Nahim, Najnin, Tanzira, Ruan, Jianhua
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9658757/
https://www.ncbi.nlm.nih.gov/pubmed/36358745
http://dx.doi.org/10.3390/cancers14215327
Descripción
Sumario:SIMPLE SUMMARY: Accurate prediction of breast cancer metastasis risks using gene expression data and machine learning can help improve cancer treatment and overall survival. However, breast cancer can be categorized into multiple subtypes, and a single predictive model may not work well for all patients. In this work, we propose a computational method to construct personalized models, where the key is to select a group of patients to train a different model for each testing patient. Experimental results on multiple datasets showed that the proposed method, termed Personalized Classifier with Multiple Thresholds (PCMT), achieved significantly better prediction accuracy than existing algorithms that train classifiers using all available patients or using patients belonging to a predefined subtype. In addition, the top features identified by PCMT are robust across different datasets, and include genes that are well known to be associated with subtype-specific metastasis. ABSTRACT: Accurate prediction of breast cancer metastasis in the early stages of cancer diagnosis is crucial to reduce cancer-related deaths. With the availability of gene expression datasets, many machine-learning models have been proposed to predict breast cancer metastasis using thousands of genes simultaneously. However, the prediction accuracy of the models using gene expression often suffers from the diverse molecular characteristics across different datasets. Additionally, breast cancer is known to have many subtypes, which hinders the performance of the models aimed at all subtypes. To overcome the heterogeneous nature of breast cancer, we propose a method to obtain personalized classifiers that are trained on subsets of patients selected using the similarities between training and testing patients. Results on multiple independent datasets showed that our proposed approach significantly improved prediction accuracy compared to the models trained on the complete training dataset and models trained on specific cancer subtypes. Our results also showed that personalized classifiers trained on positively and negatively correlated patients outperformed classifiers trained only on positively correlated patients, highlighting the importance of selecting proper patient subsets for constructing personalized classifiers. Additionally, our proposed approach obtained more robust features than the other models and identified different features for different patients, making it a promising tool for designing personalized medicine for cancer patients.