Cargando…

Identification of Breast Cancer Metastasis Markers from Gene Expression Profiles Using Machine Learning Approaches

Cancer metastasis accounts for approximately 90% of cancer deaths, and elucidating markers in metastasis is the first step in its prevention. To characterize metastasis marker genes (MGs) of breast cancer, XGBoost models that classify metastasis status were trained with gene expression profiles from...

Descripción completa

Detalles Bibliográficos
Autores principales: Jung, Jinmyung, Yoo, Sunyong
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10530902/
https://www.ncbi.nlm.nih.gov/pubmed/37761960
http://dx.doi.org/10.3390/genes14091820
_version_ 1785111594371383296
author Jung, Jinmyung
Yoo, Sunyong
author_facet Jung, Jinmyung
Yoo, Sunyong
author_sort Jung, Jinmyung
collection PubMed
description Cancer metastasis accounts for approximately 90% of cancer deaths, and elucidating markers in metastasis is the first step in its prevention. To characterize metastasis marker genes (MGs) of breast cancer, XGBoost models that classify metastasis status were trained with gene expression profiles from TCGA. Then, a metastasis score (MS) was assigned to each gene by calculating the inner product between the feature importance and the AUC performance of the models. As a result, 54, 202, and 357 genes with the highest MS were characterized as MGs by empirical p-value cutoffs of 0.001, 0.005, and 0.01, respectively. The three sets of MGs were compared with those from existing metastasis marker databases, which provided significant results in most comparisons (p-value < 0.05). They were also significantly enriched in biological processes associated with breast cancer metastasis. The three MGs, SPPL2C, KRT23, and RGS7, showed highly significant results (p-value < 0.01) in the survival analysis. The MGs that could not be identified by statistical analysis (e.g., GOLM1, ELAVL1, UBP1, and AZGP1), as well as the MGs with the highest MS (e.g., ZNF676, FAM163B, LDOC2, IRF1, and STK40), were verified via the literature. Additionally, we checked how close the MGs were to each other in the protein–protein interaction networks. We expect that the characterized markers will help understand and prevent breast cancer metastasis.
format Online
Article
Text
id pubmed-10530902
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-105309022023-09-28 Identification of Breast Cancer Metastasis Markers from Gene Expression Profiles Using Machine Learning Approaches Jung, Jinmyung Yoo, Sunyong Genes (Basel) Article Cancer metastasis accounts for approximately 90% of cancer deaths, and elucidating markers in metastasis is the first step in its prevention. To characterize metastasis marker genes (MGs) of breast cancer, XGBoost models that classify metastasis status were trained with gene expression profiles from TCGA. Then, a metastasis score (MS) was assigned to each gene by calculating the inner product between the feature importance and the AUC performance of the models. As a result, 54, 202, and 357 genes with the highest MS were characterized as MGs by empirical p-value cutoffs of 0.001, 0.005, and 0.01, respectively. The three sets of MGs were compared with those from existing metastasis marker databases, which provided significant results in most comparisons (p-value < 0.05). They were also significantly enriched in biological processes associated with breast cancer metastasis. The three MGs, SPPL2C, KRT23, and RGS7, showed highly significant results (p-value < 0.01) in the survival analysis. The MGs that could not be identified by statistical analysis (e.g., GOLM1, ELAVL1, UBP1, and AZGP1), as well as the MGs with the highest MS (e.g., ZNF676, FAM163B, LDOC2, IRF1, and STK40), were verified via the literature. Additionally, we checked how close the MGs were to each other in the protein–protein interaction networks. We expect that the characterized markers will help understand and prevent breast cancer metastasis. MDPI 2023-09-20 /pmc/articles/PMC10530902/ /pubmed/37761960 http://dx.doi.org/10.3390/genes14091820 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Jung, Jinmyung
Yoo, Sunyong
Identification of Breast Cancer Metastasis Markers from Gene Expression Profiles Using Machine Learning Approaches
title Identification of Breast Cancer Metastasis Markers from Gene Expression Profiles Using Machine Learning Approaches
title_full Identification of Breast Cancer Metastasis Markers from Gene Expression Profiles Using Machine Learning Approaches
title_fullStr Identification of Breast Cancer Metastasis Markers from Gene Expression Profiles Using Machine Learning Approaches
title_full_unstemmed Identification of Breast Cancer Metastasis Markers from Gene Expression Profiles Using Machine Learning Approaches
title_short Identification of Breast Cancer Metastasis Markers from Gene Expression Profiles Using Machine Learning Approaches
title_sort identification of breast cancer metastasis markers from gene expression profiles using machine learning approaches
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10530902/
https://www.ncbi.nlm.nih.gov/pubmed/37761960
http://dx.doi.org/10.3390/genes14091820
work_keys_str_mv AT jungjinmyung identificationofbreastcancermetastasismarkersfromgeneexpressionprofilesusingmachinelearningapproaches
AT yoosunyong identificationofbreastcancermetastasismarkersfromgeneexpressionprofilesusingmachinelearningapproaches