Cargando…

A feature selection-based framework to identify biomarkers for cancer diagnosis: A focus on lung adenocarcinoma

Lung cancer (LC) represents most of the cancer incidences in the world. There are many types of LC, but Lung Adenocarcinoma (LUAD) is the most common type. Although RNA-seq and microarray data provide a vast amount of gene expression data, most of the genes are insignificant to clinical diagnosis. F...

Descripción completa

Detalles Bibliográficos
Autores principales: Abdelwahab, Omar, Awad, Nourelislam, Elserafy, Menattallah, Badr, Eman
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9447897/
https://www.ncbi.nlm.nih.gov/pubmed/36067196
http://dx.doi.org/10.1371/journal.pone.0269126
_version_ 1784783952550035456
author Abdelwahab, Omar
Awad, Nourelislam
Elserafy, Menattallah
Badr, Eman
author_facet Abdelwahab, Omar
Awad, Nourelislam
Elserafy, Menattallah
Badr, Eman
author_sort Abdelwahab, Omar
collection PubMed
description Lung cancer (LC) represents most of the cancer incidences in the world. There are many types of LC, but Lung Adenocarcinoma (LUAD) is the most common type. Although RNA-seq and microarray data provide a vast amount of gene expression data, most of the genes are insignificant to clinical diagnosis. Feature selection (FS) techniques overcome the high dimensionality and sparsity issues of the large-scale data. We propose a framework that applies an ensemble of feature selection techniques to identify genes highly correlated to LUAD. Utilizing LUAD RNA-seq data from the Cancer Genome Atlas (TCGA), we employed mutual information (MI) and recursive feature elimination (RFE) feature selection techniques along with support vector machine (SVM) classification model. We have also utilized Random Forest (RF) as an embedded FS technique. The results were integrated and candidate biomarker genes across all techniques were identified. The proposed framework has identified 12 potential biomarkers that are highly correlated with different LC types, especially LUAD. A predictive model has been trained utilizing the identified biomarker expression profiling and performance of 97.99% was achieved. In addition, upon performing differential gene expression analysis, we could find that all 12 genes were significantly differentially expressed between normal and LUAD tissues, and strongly correlated with LUAD according to previous reports. We here propose that using multiple feature selection methods effectively reduces the number of identified biomarkers and directly affects their biological relevance.
format Online
Article
Text
id pubmed-9447897
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-94478972022-09-07 A feature selection-based framework to identify biomarkers for cancer diagnosis: A focus on lung adenocarcinoma Abdelwahab, Omar Awad, Nourelislam Elserafy, Menattallah Badr, Eman PLoS One Research Article Lung cancer (LC) represents most of the cancer incidences in the world. There are many types of LC, but Lung Adenocarcinoma (LUAD) is the most common type. Although RNA-seq and microarray data provide a vast amount of gene expression data, most of the genes are insignificant to clinical diagnosis. Feature selection (FS) techniques overcome the high dimensionality and sparsity issues of the large-scale data. We propose a framework that applies an ensemble of feature selection techniques to identify genes highly correlated to LUAD. Utilizing LUAD RNA-seq data from the Cancer Genome Atlas (TCGA), we employed mutual information (MI) and recursive feature elimination (RFE) feature selection techniques along with support vector machine (SVM) classification model. We have also utilized Random Forest (RF) as an embedded FS technique. The results were integrated and candidate biomarker genes across all techniques were identified. The proposed framework has identified 12 potential biomarkers that are highly correlated with different LC types, especially LUAD. A predictive model has been trained utilizing the identified biomarker expression profiling and performance of 97.99% was achieved. In addition, upon performing differential gene expression analysis, we could find that all 12 genes were significantly differentially expressed between normal and LUAD tissues, and strongly correlated with LUAD according to previous reports. We here propose that using multiple feature selection methods effectively reduces the number of identified biomarkers and directly affects their biological relevance. Public Library of Science 2022-09-06 /pmc/articles/PMC9447897/ /pubmed/36067196 http://dx.doi.org/10.1371/journal.pone.0269126 Text en © 2022 Abdelwahab et al https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Abdelwahab, Omar
Awad, Nourelislam
Elserafy, Menattallah
Badr, Eman
A feature selection-based framework to identify biomarkers for cancer diagnosis: A focus on lung adenocarcinoma
title A feature selection-based framework to identify biomarkers for cancer diagnosis: A focus on lung adenocarcinoma
title_full A feature selection-based framework to identify biomarkers for cancer diagnosis: A focus on lung adenocarcinoma
title_fullStr A feature selection-based framework to identify biomarkers for cancer diagnosis: A focus on lung adenocarcinoma
title_full_unstemmed A feature selection-based framework to identify biomarkers for cancer diagnosis: A focus on lung adenocarcinoma
title_short A feature selection-based framework to identify biomarkers for cancer diagnosis: A focus on lung adenocarcinoma
title_sort feature selection-based framework to identify biomarkers for cancer diagnosis: a focus on lung adenocarcinoma
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9447897/
https://www.ncbi.nlm.nih.gov/pubmed/36067196
http://dx.doi.org/10.1371/journal.pone.0269126
work_keys_str_mv AT abdelwahabomar afeatureselectionbasedframeworktoidentifybiomarkersforcancerdiagnosisafocusonlungadenocarcinoma
AT awadnourelislam afeatureselectionbasedframeworktoidentifybiomarkersforcancerdiagnosisafocusonlungadenocarcinoma
AT elserafymenattallah afeatureselectionbasedframeworktoidentifybiomarkersforcancerdiagnosisafocusonlungadenocarcinoma
AT badreman afeatureselectionbasedframeworktoidentifybiomarkersforcancerdiagnosisafocusonlungadenocarcinoma
AT abdelwahabomar featureselectionbasedframeworktoidentifybiomarkersforcancerdiagnosisafocusonlungadenocarcinoma
AT awadnourelislam featureselectionbasedframeworktoidentifybiomarkersforcancerdiagnosisafocusonlungadenocarcinoma
AT elserafymenattallah featureselectionbasedframeworktoidentifybiomarkersforcancerdiagnosisafocusonlungadenocarcinoma
AT badreman featureselectionbasedframeworktoidentifybiomarkersforcancerdiagnosisafocusonlungadenocarcinoma