Cargando…

EARN: an ensemble machine learning algorithm to predict driver genes in metastatic breast cancer

BACKGROUND: Today, there are a lot of markers on the prognosis and diagnosis of complex diseases such as primary breast cancer. However, our understanding of the drivers that influence cancer aggression is limited. METHODS: In this work, we study somatic mutation data consists of 450 metastatic brea...

Descripción completa

Detalles Bibliográficos
Autores principales: Mirsadeghi, Leila, Haji Hosseini, Reza, Banaei-Moghaddam, Ali Mohammad, Kavousi, Kaveh
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8105935/
https://www.ncbi.nlm.nih.gov/pubmed/33962648
http://dx.doi.org/10.1186/s12920-021-00974-3
_version_ 1783689680185720832
author Mirsadeghi, Leila
Haji Hosseini, Reza
Banaei-Moghaddam, Ali Mohammad
Kavousi, Kaveh
author_facet Mirsadeghi, Leila
Haji Hosseini, Reza
Banaei-Moghaddam, Ali Mohammad
Kavousi, Kaveh
author_sort Mirsadeghi, Leila
collection PubMed
description BACKGROUND: Today, there are a lot of markers on the prognosis and diagnosis of complex diseases such as primary breast cancer. However, our understanding of the drivers that influence cancer aggression is limited. METHODS: In this work, we study somatic mutation data consists of 450 metastatic breast tumor samples from cBio Cancer Genomics Portal. We use four software tools to extract features from this data. Then, an ensemble classifier (EC) learning algorithm called EARN (Ensemble of Artificial Neural Network, Random Forest, and non-linear Support Vector Machine) is proposed to evaluate plausible driver genes for metastatic breast cancer (MBCA). The decision-making strategy for the proposed ensemble machine is based on the aggregation of the predicted scores obtained from individual learning classifiers to be prioritized homo sapiens genes annotated as protein-coding from NCBI. RESULTS: This study is an attempt to focus on the findings in several aspects of MBCA prognosis and diagnosis. First, drivers and passengers predicted by SVM, ANN, RF, and EARN are introduced. Second, biological inferences of predictions are discussed based on gene set enrichment analysis. Third, statistical validation and comparison of all learning methods are performed by some evaluation metrics. Finally, the pathway enrichment analysis (PEA) using ReactomeFIVIz tool (FDR < 0.03) for the top 100 genes predicted by EARN leads us to propose a new gene set panel for MBCA. It includes HDAC3, ABAT, GRIN1, PLCB1, and KPNA2 as well as NCOR1, TBL1XR1, SIRT4, KRAS, CACNA1E, PRKCG, GPS2, SIN3A, ACTB, KDM6B, and PRMT1. Furthermore, we compare results for MBCA to other outputs regarding 983 primary tumor samples of breast invasive carcinoma (BRCA) obtained from the Cancer Genome Atlas (TCGA). The comparison between outputs shows that ROC-AUC reaches 99.24% using EARN for MBCA and 99.79% for BRCA. This statistical result is better than three individual classifiers in each case. CONCLUSIONS: This research using an integrative approach assists precision oncologists to design compact targeted panels that eliminate the need for whole-genome/exome sequencing. The schematic representation of the proposed model is presented as the Graphic abstract. GRAPHIC ABSTRACT: [Image: see text] SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12920-021-00974-3.
format Online
Article
Text
id pubmed-8105935
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-81059352021-05-10 EARN: an ensemble machine learning algorithm to predict driver genes in metastatic breast cancer Mirsadeghi, Leila Haji Hosseini, Reza Banaei-Moghaddam, Ali Mohammad Kavousi, Kaveh BMC Med Genomics Research BACKGROUND: Today, there are a lot of markers on the prognosis and diagnosis of complex diseases such as primary breast cancer. However, our understanding of the drivers that influence cancer aggression is limited. METHODS: In this work, we study somatic mutation data consists of 450 metastatic breast tumor samples from cBio Cancer Genomics Portal. We use four software tools to extract features from this data. Then, an ensemble classifier (EC) learning algorithm called EARN (Ensemble of Artificial Neural Network, Random Forest, and non-linear Support Vector Machine) is proposed to evaluate plausible driver genes for metastatic breast cancer (MBCA). The decision-making strategy for the proposed ensemble machine is based on the aggregation of the predicted scores obtained from individual learning classifiers to be prioritized homo sapiens genes annotated as protein-coding from NCBI. RESULTS: This study is an attempt to focus on the findings in several aspects of MBCA prognosis and diagnosis. First, drivers and passengers predicted by SVM, ANN, RF, and EARN are introduced. Second, biological inferences of predictions are discussed based on gene set enrichment analysis. Third, statistical validation and comparison of all learning methods are performed by some evaluation metrics. Finally, the pathway enrichment analysis (PEA) using ReactomeFIVIz tool (FDR < 0.03) for the top 100 genes predicted by EARN leads us to propose a new gene set panel for MBCA. It includes HDAC3, ABAT, GRIN1, PLCB1, and KPNA2 as well as NCOR1, TBL1XR1, SIRT4, KRAS, CACNA1E, PRKCG, GPS2, SIN3A, ACTB, KDM6B, and PRMT1. Furthermore, we compare results for MBCA to other outputs regarding 983 primary tumor samples of breast invasive carcinoma (BRCA) obtained from the Cancer Genome Atlas (TCGA). The comparison between outputs shows that ROC-AUC reaches 99.24% using EARN for MBCA and 99.79% for BRCA. This statistical result is better than three individual classifiers in each case. CONCLUSIONS: This research using an integrative approach assists precision oncologists to design compact targeted panels that eliminate the need for whole-genome/exome sequencing. The schematic representation of the proposed model is presented as the Graphic abstract. GRAPHIC ABSTRACT: [Image: see text] SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12920-021-00974-3. BioMed Central 2021-05-07 /pmc/articles/PMC8105935/ /pubmed/33962648 http://dx.doi.org/10.1186/s12920-021-00974-3 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Mirsadeghi, Leila
Haji Hosseini, Reza
Banaei-Moghaddam, Ali Mohammad
Kavousi, Kaveh
EARN: an ensemble machine learning algorithm to predict driver genes in metastatic breast cancer
title EARN: an ensemble machine learning algorithm to predict driver genes in metastatic breast cancer
title_full EARN: an ensemble machine learning algorithm to predict driver genes in metastatic breast cancer
title_fullStr EARN: an ensemble machine learning algorithm to predict driver genes in metastatic breast cancer
title_full_unstemmed EARN: an ensemble machine learning algorithm to predict driver genes in metastatic breast cancer
title_short EARN: an ensemble machine learning algorithm to predict driver genes in metastatic breast cancer
title_sort earn: an ensemble machine learning algorithm to predict driver genes in metastatic breast cancer
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8105935/
https://www.ncbi.nlm.nih.gov/pubmed/33962648
http://dx.doi.org/10.1186/s12920-021-00974-3
work_keys_str_mv AT mirsadeghileila earnanensemblemachinelearningalgorithmtopredictdrivergenesinmetastaticbreastcancer
AT hajihosseinireza earnanensemblemachinelearningalgorithmtopredictdrivergenesinmetastaticbreastcancer
AT banaeimoghaddamalimohammad earnanensemblemachinelearningalgorithmtopredictdrivergenesinmetastaticbreastcancer
AT kavousikaveh earnanensemblemachinelearningalgorithmtopredictdrivergenesinmetastaticbreastcancer