Cargando…

Filter and Wrapper Stacking Ensemble (FWSE): a robust approach for reliable biomarker discovery in high-dimensional omics data

Selecting informative features, such as accurate biomarkers for disease diagnosis, prognosis and response to treatment, is an essential task in the field of bioinformatics. Medical data often contain thousands of features and identifying potential biomarkers is challenging due to small number of sam...

Descripción completa

Detalles Bibliográficos
Autores principales: Budhraja, Sugam, Doborjeh, Maryam, Singh, Balkaran, Tan, Samuel, Doborjeh, Zohreh, Lai, Edmund, Merkin, Alexander, Lee, Jimmy, Goh, Wilson, Kasabov, Nikola
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10605029/
https://www.ncbi.nlm.nih.gov/pubmed/37889118
http://dx.doi.org/10.1093/bib/bbad382
_version_ 1785126975910707200
author Budhraja, Sugam
Doborjeh, Maryam
Singh, Balkaran
Tan, Samuel
Doborjeh, Zohreh
Lai, Edmund
Merkin, Alexander
Lee, Jimmy
Goh, Wilson
Kasabov, Nikola
author_facet Budhraja, Sugam
Doborjeh, Maryam
Singh, Balkaran
Tan, Samuel
Doborjeh, Zohreh
Lai, Edmund
Merkin, Alexander
Lee, Jimmy
Goh, Wilson
Kasabov, Nikola
author_sort Budhraja, Sugam
collection PubMed
description Selecting informative features, such as accurate biomarkers for disease diagnosis, prognosis and response to treatment, is an essential task in the field of bioinformatics. Medical data often contain thousands of features and identifying potential biomarkers is challenging due to small number of samples in the data, method dependence and non-reproducibility. This paper proposes a novel ensemble feature selection method, named Filter and Wrapper Stacking Ensemble (FWSE), to identify reproducible biomarkers from high-dimensional omics data. In FWSE, filter feature selection methods are run on numerous subsets of the data to eliminate irrelevant features, and then wrapper feature selection methods are applied to rank the top features. The method was validated on four high-dimensional medical datasets related to mental illnesses and cancer. The results indicate that the features selected by FWSE are stable and statistically more significant than the ones obtained by existing methods while also demonstrating biological relevance. Furthermore, FWSE is a generic method, applicable to various high-dimensional datasets in the fields of machine intelligence and bioinformatics.
format Online
Article
Text
id pubmed-10605029
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-106050292023-10-28 Filter and Wrapper Stacking Ensemble (FWSE): a robust approach for reliable biomarker discovery in high-dimensional omics data Budhraja, Sugam Doborjeh, Maryam Singh, Balkaran Tan, Samuel Doborjeh, Zohreh Lai, Edmund Merkin, Alexander Lee, Jimmy Goh, Wilson Kasabov, Nikola Brief Bioinform Problem Solving Protocol Selecting informative features, such as accurate biomarkers for disease diagnosis, prognosis and response to treatment, is an essential task in the field of bioinformatics. Medical data often contain thousands of features and identifying potential biomarkers is challenging due to small number of samples in the data, method dependence and non-reproducibility. This paper proposes a novel ensemble feature selection method, named Filter and Wrapper Stacking Ensemble (FWSE), to identify reproducible biomarkers from high-dimensional omics data. In FWSE, filter feature selection methods are run on numerous subsets of the data to eliminate irrelevant features, and then wrapper feature selection methods are applied to rank the top features. The method was validated on four high-dimensional medical datasets related to mental illnesses and cancer. The results indicate that the features selected by FWSE are stable and statistically more significant than the ones obtained by existing methods while also demonstrating biological relevance. Furthermore, FWSE is a generic method, applicable to various high-dimensional datasets in the fields of machine intelligence and bioinformatics. Oxford University Press 2023-10-26 /pmc/articles/PMC10605029/ /pubmed/37889118 http://dx.doi.org/10.1093/bib/bbad382 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Problem Solving Protocol
Budhraja, Sugam
Doborjeh, Maryam
Singh, Balkaran
Tan, Samuel
Doborjeh, Zohreh
Lai, Edmund
Merkin, Alexander
Lee, Jimmy
Goh, Wilson
Kasabov, Nikola
Filter and Wrapper Stacking Ensemble (FWSE): a robust approach for reliable biomarker discovery in high-dimensional omics data
title Filter and Wrapper Stacking Ensemble (FWSE): a robust approach for reliable biomarker discovery in high-dimensional omics data
title_full Filter and Wrapper Stacking Ensemble (FWSE): a robust approach for reliable biomarker discovery in high-dimensional omics data
title_fullStr Filter and Wrapper Stacking Ensemble (FWSE): a robust approach for reliable biomarker discovery in high-dimensional omics data
title_full_unstemmed Filter and Wrapper Stacking Ensemble (FWSE): a robust approach for reliable biomarker discovery in high-dimensional omics data
title_short Filter and Wrapper Stacking Ensemble (FWSE): a robust approach for reliable biomarker discovery in high-dimensional omics data
title_sort filter and wrapper stacking ensemble (fwse): a robust approach for reliable biomarker discovery in high-dimensional omics data
topic Problem Solving Protocol
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10605029/
https://www.ncbi.nlm.nih.gov/pubmed/37889118
http://dx.doi.org/10.1093/bib/bbad382
work_keys_str_mv AT budhrajasugam filterandwrapperstackingensemblefwsearobustapproachforreliablebiomarkerdiscoveryinhighdimensionalomicsdata
AT doborjehmaryam filterandwrapperstackingensemblefwsearobustapproachforreliablebiomarkerdiscoveryinhighdimensionalomicsdata
AT singhbalkaran filterandwrapperstackingensemblefwsearobustapproachforreliablebiomarkerdiscoveryinhighdimensionalomicsdata
AT tansamuel filterandwrapperstackingensemblefwsearobustapproachforreliablebiomarkerdiscoveryinhighdimensionalomicsdata
AT doborjehzohreh filterandwrapperstackingensemblefwsearobustapproachforreliablebiomarkerdiscoveryinhighdimensionalomicsdata
AT laiedmund filterandwrapperstackingensemblefwsearobustapproachforreliablebiomarkerdiscoveryinhighdimensionalomicsdata
AT merkinalexander filterandwrapperstackingensemblefwsearobustapproachforreliablebiomarkerdiscoveryinhighdimensionalomicsdata
AT leejimmy filterandwrapperstackingensemblefwsearobustapproachforreliablebiomarkerdiscoveryinhighdimensionalomicsdata
AT gohwilson filterandwrapperstackingensemblefwsearobustapproachforreliablebiomarkerdiscoveryinhighdimensionalomicsdata
AT kasabovnikola filterandwrapperstackingensemblefwsearobustapproachforreliablebiomarkerdiscoveryinhighdimensionalomicsdata