Cargando…
A Computational Approach to Identification of Candidate Biomarkers in High-Dimensional Molecular Data
Complex high-dimensional datasets that are challenging to analyze are frequently produced through ‘-omics’ profiling. Typically, these datasets contain more genomic features than samples, limiting the use of multivariable statistical and machine learning-based approaches to analysis. Therefore, effe...
Autores principales: | , , , , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9407361/ https://www.ncbi.nlm.nih.gov/pubmed/36010347 http://dx.doi.org/10.3390/diagnostics12081997 |
_version_ | 1784774345544957952 |
---|---|
author | Gerolami, Justin Wong, Justin Jong Mun Zhang, Ricky Chen, Tong Imtiaz, Tashifa Smith, Miranda Jamaspishvili, Tamara Koti, Madhuri Glasgow, Janice Irene Mousavi, Parvin Renwick, Neil Tyryshkin, Kathrin |
author_facet | Gerolami, Justin Wong, Justin Jong Mun Zhang, Ricky Chen, Tong Imtiaz, Tashifa Smith, Miranda Jamaspishvili, Tamara Koti, Madhuri Glasgow, Janice Irene Mousavi, Parvin Renwick, Neil Tyryshkin, Kathrin |
author_sort | Gerolami, Justin |
collection | PubMed |
description | Complex high-dimensional datasets that are challenging to analyze are frequently produced through ‘-omics’ profiling. Typically, these datasets contain more genomic features than samples, limiting the use of multivariable statistical and machine learning-based approaches to analysis. Therefore, effective alternative approaches are urgently needed to identify features-of-interest in ‘-omics’ data. In this study, we present the molecular feature selection tool, a novel, ensemble-based, feature selection application for identifying candidate biomarkers in ‘-omics’ data. As proof-of-principle, we applied the molecular feature selection tool to identify a small set of immune-related genes as potential biomarkers of three prostate adenocarcinoma subtypes. Furthermore, we tested the selected genes in a model to classify the three subtypes and compared the results to models built using all genes and all differentially expressed genes. Genes identified with the molecular feature selection tool performed better than the other models in this study in all comparison metrics: accuracy, precision, recall, and F1-score using a significantly smaller set of genes. In addition, we developed a simple graphical user interface for the molecular feature selection tool, which is available for free download. This user-friendly interface is a valuable tool for the identification of potential biomarkers in gene expression datasets and is an asset for biomarker discovery studies. |
format | Online Article Text |
id | pubmed-9407361 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-94073612022-08-26 A Computational Approach to Identification of Candidate Biomarkers in High-Dimensional Molecular Data Gerolami, Justin Wong, Justin Jong Mun Zhang, Ricky Chen, Tong Imtiaz, Tashifa Smith, Miranda Jamaspishvili, Tamara Koti, Madhuri Glasgow, Janice Irene Mousavi, Parvin Renwick, Neil Tyryshkin, Kathrin Diagnostics (Basel) Article Complex high-dimensional datasets that are challenging to analyze are frequently produced through ‘-omics’ profiling. Typically, these datasets contain more genomic features than samples, limiting the use of multivariable statistical and machine learning-based approaches to analysis. Therefore, effective alternative approaches are urgently needed to identify features-of-interest in ‘-omics’ data. In this study, we present the molecular feature selection tool, a novel, ensemble-based, feature selection application for identifying candidate biomarkers in ‘-omics’ data. As proof-of-principle, we applied the molecular feature selection tool to identify a small set of immune-related genes as potential biomarkers of three prostate adenocarcinoma subtypes. Furthermore, we tested the selected genes in a model to classify the three subtypes and compared the results to models built using all genes and all differentially expressed genes. Genes identified with the molecular feature selection tool performed better than the other models in this study in all comparison metrics: accuracy, precision, recall, and F1-score using a significantly smaller set of genes. In addition, we developed a simple graphical user interface for the molecular feature selection tool, which is available for free download. This user-friendly interface is a valuable tool for the identification of potential biomarkers in gene expression datasets and is an asset for biomarker discovery studies. MDPI 2022-08-18 /pmc/articles/PMC9407361/ /pubmed/36010347 http://dx.doi.org/10.3390/diagnostics12081997 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Gerolami, Justin Wong, Justin Jong Mun Zhang, Ricky Chen, Tong Imtiaz, Tashifa Smith, Miranda Jamaspishvili, Tamara Koti, Madhuri Glasgow, Janice Irene Mousavi, Parvin Renwick, Neil Tyryshkin, Kathrin A Computational Approach to Identification of Candidate Biomarkers in High-Dimensional Molecular Data |
title | A Computational Approach to Identification of Candidate Biomarkers in High-Dimensional Molecular Data |
title_full | A Computational Approach to Identification of Candidate Biomarkers in High-Dimensional Molecular Data |
title_fullStr | A Computational Approach to Identification of Candidate Biomarkers in High-Dimensional Molecular Data |
title_full_unstemmed | A Computational Approach to Identification of Candidate Biomarkers in High-Dimensional Molecular Data |
title_short | A Computational Approach to Identification of Candidate Biomarkers in High-Dimensional Molecular Data |
title_sort | computational approach to identification of candidate biomarkers in high-dimensional molecular data |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9407361/ https://www.ncbi.nlm.nih.gov/pubmed/36010347 http://dx.doi.org/10.3390/diagnostics12081997 |
work_keys_str_mv | AT gerolamijustin acomputationalapproachtoidentificationofcandidatebiomarkersinhighdimensionalmoleculardata AT wongjustinjongmun acomputationalapproachtoidentificationofcandidatebiomarkersinhighdimensionalmoleculardata AT zhangricky acomputationalapproachtoidentificationofcandidatebiomarkersinhighdimensionalmoleculardata AT chentong acomputationalapproachtoidentificationofcandidatebiomarkersinhighdimensionalmoleculardata AT imtiaztashifa acomputationalapproachtoidentificationofcandidatebiomarkersinhighdimensionalmoleculardata AT smithmiranda acomputationalapproachtoidentificationofcandidatebiomarkersinhighdimensionalmoleculardata AT jamaspishvilitamara acomputationalapproachtoidentificationofcandidatebiomarkersinhighdimensionalmoleculardata AT kotimadhuri acomputationalapproachtoidentificationofcandidatebiomarkersinhighdimensionalmoleculardata AT glasgowjaniceirene acomputationalapproachtoidentificationofcandidatebiomarkersinhighdimensionalmoleculardata AT mousaviparvin acomputationalapproachtoidentificationofcandidatebiomarkersinhighdimensionalmoleculardata AT renwickneil acomputationalapproachtoidentificationofcandidatebiomarkersinhighdimensionalmoleculardata AT tyryshkinkathrin acomputationalapproachtoidentificationofcandidatebiomarkersinhighdimensionalmoleculardata AT gerolamijustin computationalapproachtoidentificationofcandidatebiomarkersinhighdimensionalmoleculardata AT wongjustinjongmun computationalapproachtoidentificationofcandidatebiomarkersinhighdimensionalmoleculardata AT zhangricky computationalapproachtoidentificationofcandidatebiomarkersinhighdimensionalmoleculardata AT chentong computationalapproachtoidentificationofcandidatebiomarkersinhighdimensionalmoleculardata AT imtiaztashifa computationalapproachtoidentificationofcandidatebiomarkersinhighdimensionalmoleculardata AT smithmiranda computationalapproachtoidentificationofcandidatebiomarkersinhighdimensionalmoleculardata AT jamaspishvilitamara computationalapproachtoidentificationofcandidatebiomarkersinhighdimensionalmoleculardata AT kotimadhuri computationalapproachtoidentificationofcandidatebiomarkersinhighdimensionalmoleculardata AT glasgowjaniceirene computationalapproachtoidentificationofcandidatebiomarkersinhighdimensionalmoleculardata AT mousaviparvin computationalapproachtoidentificationofcandidatebiomarkersinhighdimensionalmoleculardata AT renwickneil computationalapproachtoidentificationofcandidatebiomarkersinhighdimensionalmoleculardata AT tyryshkinkathrin computationalapproachtoidentificationofcandidatebiomarkersinhighdimensionalmoleculardata |