Cargando…

A Computational Approach to Identification of Candidate Biomarkers in High-Dimensional Molecular Data

Complex high-dimensional datasets that are challenging to analyze are frequently produced through ‘-omics’ profiling. Typically, these datasets contain more genomic features than samples, limiting the use of multivariable statistical and machine learning-based approaches to analysis. Therefore, effe...

Descripción completa

Detalles Bibliográficos
Autores principales: Gerolami, Justin, Wong, Justin Jong Mun, Zhang, Ricky, Chen, Tong, Imtiaz, Tashifa, Smith, Miranda, Jamaspishvili, Tamara, Koti, Madhuri, Glasgow, Janice Irene, Mousavi, Parvin, Renwick, Neil, Tyryshkin, Kathrin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9407361/
https://www.ncbi.nlm.nih.gov/pubmed/36010347
http://dx.doi.org/10.3390/diagnostics12081997
_version_ 1784774345544957952
author Gerolami, Justin
Wong, Justin Jong Mun
Zhang, Ricky
Chen, Tong
Imtiaz, Tashifa
Smith, Miranda
Jamaspishvili, Tamara
Koti, Madhuri
Glasgow, Janice Irene
Mousavi, Parvin
Renwick, Neil
Tyryshkin, Kathrin
author_facet Gerolami, Justin
Wong, Justin Jong Mun
Zhang, Ricky
Chen, Tong
Imtiaz, Tashifa
Smith, Miranda
Jamaspishvili, Tamara
Koti, Madhuri
Glasgow, Janice Irene
Mousavi, Parvin
Renwick, Neil
Tyryshkin, Kathrin
author_sort Gerolami, Justin
collection PubMed
description Complex high-dimensional datasets that are challenging to analyze are frequently produced through ‘-omics’ profiling. Typically, these datasets contain more genomic features than samples, limiting the use of multivariable statistical and machine learning-based approaches to analysis. Therefore, effective alternative approaches are urgently needed to identify features-of-interest in ‘-omics’ data. In this study, we present the molecular feature selection tool, a novel, ensemble-based, feature selection application for identifying candidate biomarkers in ‘-omics’ data. As proof-of-principle, we applied the molecular feature selection tool to identify a small set of immune-related genes as potential biomarkers of three prostate adenocarcinoma subtypes. Furthermore, we tested the selected genes in a model to classify the three subtypes and compared the results to models built using all genes and all differentially expressed genes. Genes identified with the molecular feature selection tool performed better than the other models in this study in all comparison metrics: accuracy, precision, recall, and F1-score using a significantly smaller set of genes. In addition, we developed a simple graphical user interface for the molecular feature selection tool, which is available for free download. This user-friendly interface is a valuable tool for the identification of potential biomarkers in gene expression datasets and is an asset for biomarker discovery studies.
format Online
Article
Text
id pubmed-9407361
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-94073612022-08-26 A Computational Approach to Identification of Candidate Biomarkers in High-Dimensional Molecular Data Gerolami, Justin Wong, Justin Jong Mun Zhang, Ricky Chen, Tong Imtiaz, Tashifa Smith, Miranda Jamaspishvili, Tamara Koti, Madhuri Glasgow, Janice Irene Mousavi, Parvin Renwick, Neil Tyryshkin, Kathrin Diagnostics (Basel) Article Complex high-dimensional datasets that are challenging to analyze are frequently produced through ‘-omics’ profiling. Typically, these datasets contain more genomic features than samples, limiting the use of multivariable statistical and machine learning-based approaches to analysis. Therefore, effective alternative approaches are urgently needed to identify features-of-interest in ‘-omics’ data. In this study, we present the molecular feature selection tool, a novel, ensemble-based, feature selection application for identifying candidate biomarkers in ‘-omics’ data. As proof-of-principle, we applied the molecular feature selection tool to identify a small set of immune-related genes as potential biomarkers of three prostate adenocarcinoma subtypes. Furthermore, we tested the selected genes in a model to classify the three subtypes and compared the results to models built using all genes and all differentially expressed genes. Genes identified with the molecular feature selection tool performed better than the other models in this study in all comparison metrics: accuracy, precision, recall, and F1-score using a significantly smaller set of genes. In addition, we developed a simple graphical user interface for the molecular feature selection tool, which is available for free download. This user-friendly interface is a valuable tool for the identification of potential biomarkers in gene expression datasets and is an asset for biomarker discovery studies. MDPI 2022-08-18 /pmc/articles/PMC9407361/ /pubmed/36010347 http://dx.doi.org/10.3390/diagnostics12081997 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Gerolami, Justin
Wong, Justin Jong Mun
Zhang, Ricky
Chen, Tong
Imtiaz, Tashifa
Smith, Miranda
Jamaspishvili, Tamara
Koti, Madhuri
Glasgow, Janice Irene
Mousavi, Parvin
Renwick, Neil
Tyryshkin, Kathrin
A Computational Approach to Identification of Candidate Biomarkers in High-Dimensional Molecular Data
title A Computational Approach to Identification of Candidate Biomarkers in High-Dimensional Molecular Data
title_full A Computational Approach to Identification of Candidate Biomarkers in High-Dimensional Molecular Data
title_fullStr A Computational Approach to Identification of Candidate Biomarkers in High-Dimensional Molecular Data
title_full_unstemmed A Computational Approach to Identification of Candidate Biomarkers in High-Dimensional Molecular Data
title_short A Computational Approach to Identification of Candidate Biomarkers in High-Dimensional Molecular Data
title_sort computational approach to identification of candidate biomarkers in high-dimensional molecular data
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9407361/
https://www.ncbi.nlm.nih.gov/pubmed/36010347
http://dx.doi.org/10.3390/diagnostics12081997
work_keys_str_mv AT gerolamijustin acomputationalapproachtoidentificationofcandidatebiomarkersinhighdimensionalmoleculardata
AT wongjustinjongmun acomputationalapproachtoidentificationofcandidatebiomarkersinhighdimensionalmoleculardata
AT zhangricky acomputationalapproachtoidentificationofcandidatebiomarkersinhighdimensionalmoleculardata
AT chentong acomputationalapproachtoidentificationofcandidatebiomarkersinhighdimensionalmoleculardata
AT imtiaztashifa acomputationalapproachtoidentificationofcandidatebiomarkersinhighdimensionalmoleculardata
AT smithmiranda acomputationalapproachtoidentificationofcandidatebiomarkersinhighdimensionalmoleculardata
AT jamaspishvilitamara acomputationalapproachtoidentificationofcandidatebiomarkersinhighdimensionalmoleculardata
AT kotimadhuri acomputationalapproachtoidentificationofcandidatebiomarkersinhighdimensionalmoleculardata
AT glasgowjaniceirene acomputationalapproachtoidentificationofcandidatebiomarkersinhighdimensionalmoleculardata
AT mousaviparvin acomputationalapproachtoidentificationofcandidatebiomarkersinhighdimensionalmoleculardata
AT renwickneil acomputationalapproachtoidentificationofcandidatebiomarkersinhighdimensionalmoleculardata
AT tyryshkinkathrin acomputationalapproachtoidentificationofcandidatebiomarkersinhighdimensionalmoleculardata
AT gerolamijustin computationalapproachtoidentificationofcandidatebiomarkersinhighdimensionalmoleculardata
AT wongjustinjongmun computationalapproachtoidentificationofcandidatebiomarkersinhighdimensionalmoleculardata
AT zhangricky computationalapproachtoidentificationofcandidatebiomarkersinhighdimensionalmoleculardata
AT chentong computationalapproachtoidentificationofcandidatebiomarkersinhighdimensionalmoleculardata
AT imtiaztashifa computationalapproachtoidentificationofcandidatebiomarkersinhighdimensionalmoleculardata
AT smithmiranda computationalapproachtoidentificationofcandidatebiomarkersinhighdimensionalmoleculardata
AT jamaspishvilitamara computationalapproachtoidentificationofcandidatebiomarkersinhighdimensionalmoleculardata
AT kotimadhuri computationalapproachtoidentificationofcandidatebiomarkersinhighdimensionalmoleculardata
AT glasgowjaniceirene computationalapproachtoidentificationofcandidatebiomarkersinhighdimensionalmoleculardata
AT mousaviparvin computationalapproachtoidentificationofcandidatebiomarkersinhighdimensionalmoleculardata
AT renwickneil computationalapproachtoidentificationofcandidatebiomarkersinhighdimensionalmoleculardata
AT tyryshkinkathrin computationalapproachtoidentificationofcandidatebiomarkersinhighdimensionalmoleculardata