Cargando…

MEvA-X: a hybrid multiobjective evolutionary tool using an XGBoost classifier for biomarkers discovery on biomedical datasets

MOTIVATION: Biomarker discovery is one of the most frequent pursuits in bioinformatics and is crucial for precision medicine, disease prognosis, and drug discovery. A common challenge of biomarker discovery applications is the low ratio of samples over features for the selection of a reliable not-re...

Descripción completa

Detalles Bibliográficos
Autores principales: Panagiotopoulos, Konstantinos, Korfiati, Aigli, Theofilatos, Konstantinos, Hurwitz, Peter, Deriu, Marco Agostino, Mavroudi, Seferina
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10354005/
https://www.ncbi.nlm.nih.gov/pubmed/37326976
http://dx.doi.org/10.1093/bioinformatics/btad384
_version_ 1785074826820452352
author Panagiotopoulos, Konstantinos
Korfiati, Aigli
Theofilatos, Konstantinos
Hurwitz, Peter
Deriu, Marco Agostino
Mavroudi, Seferina
author_facet Panagiotopoulos, Konstantinos
Korfiati, Aigli
Theofilatos, Konstantinos
Hurwitz, Peter
Deriu, Marco Agostino
Mavroudi, Seferina
author_sort Panagiotopoulos, Konstantinos
collection PubMed
description MOTIVATION: Biomarker discovery is one of the most frequent pursuits in bioinformatics and is crucial for precision medicine, disease prognosis, and drug discovery. A common challenge of biomarker discovery applications is the low ratio of samples over features for the selection of a reliable not-redundant subset of features, but despite the development of efficient tree-based classification methods, such as the extreme gradient boosting (XGBoost), this limitation is still relevant. Moreover, existing approaches for optimizing XGBoost do not deal effectively with the class imbalance nature of the biomarker discovery problems, and the presence of multiple conflicting objectives, since they focus on the training of a single-objective model. In the current work, we introduce MEvA-X, a novel hybrid ensemble for feature selection (FS) and classification, combining a niche-based multiobjective evolutionary algorithm (EA) with the XGBoost classifier. MEvA-X deploys a multiobjective EA to optimize the hyperparameters of the classifier and perform FS, identifying a set of Pareto-optimal solutions and optimizing multiple objectives, including classification and model simplicity metrics. RESULTS: The performance of the MEvA-X tool was benchmarked using one omics dataset coming from a microarray gene expression experiment, and one clinical questionnaire-based dataset combined with demographic information. MEvA-X tool outperformed the state-of-the-art methods in the balanced categorization of classes, creating multiple low-complexity models and identifying important nonredundant biomarkers. The best-performing run of MEvA-X for the prediction of weight loss using gene expression data yields a small set of blood circulatory markers which are sufficient for this precision nutrition application but need further validation. AVAILABILITY AND IMPLEMENTATION: https://github.com/PanKonstantinos/MEvA-X.
format Online
Article
Text
id pubmed-10354005
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-103540052023-07-20 MEvA-X: a hybrid multiobjective evolutionary tool using an XGBoost classifier for biomarkers discovery on biomedical datasets Panagiotopoulos, Konstantinos Korfiati, Aigli Theofilatos, Konstantinos Hurwitz, Peter Deriu, Marco Agostino Mavroudi, Seferina Bioinformatics Original Paper MOTIVATION: Biomarker discovery is one of the most frequent pursuits in bioinformatics and is crucial for precision medicine, disease prognosis, and drug discovery. A common challenge of biomarker discovery applications is the low ratio of samples over features for the selection of a reliable not-redundant subset of features, but despite the development of efficient tree-based classification methods, such as the extreme gradient boosting (XGBoost), this limitation is still relevant. Moreover, existing approaches for optimizing XGBoost do not deal effectively with the class imbalance nature of the biomarker discovery problems, and the presence of multiple conflicting objectives, since they focus on the training of a single-objective model. In the current work, we introduce MEvA-X, a novel hybrid ensemble for feature selection (FS) and classification, combining a niche-based multiobjective evolutionary algorithm (EA) with the XGBoost classifier. MEvA-X deploys a multiobjective EA to optimize the hyperparameters of the classifier and perform FS, identifying a set of Pareto-optimal solutions and optimizing multiple objectives, including classification and model simplicity metrics. RESULTS: The performance of the MEvA-X tool was benchmarked using one omics dataset coming from a microarray gene expression experiment, and one clinical questionnaire-based dataset combined with demographic information. MEvA-X tool outperformed the state-of-the-art methods in the balanced categorization of classes, creating multiple low-complexity models and identifying important nonredundant biomarkers. The best-performing run of MEvA-X for the prediction of weight loss using gene expression data yields a small set of blood circulatory markers which are sufficient for this precision nutrition application but need further validation. AVAILABILITY AND IMPLEMENTATION: https://github.com/PanKonstantinos/MEvA-X. Oxford University Press 2023-06-16 /pmc/articles/PMC10354005/ /pubmed/37326976 http://dx.doi.org/10.1093/bioinformatics/btad384 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Paper
Panagiotopoulos, Konstantinos
Korfiati, Aigli
Theofilatos, Konstantinos
Hurwitz, Peter
Deriu, Marco Agostino
Mavroudi, Seferina
MEvA-X: a hybrid multiobjective evolutionary tool using an XGBoost classifier for biomarkers discovery on biomedical datasets
title MEvA-X: a hybrid multiobjective evolutionary tool using an XGBoost classifier for biomarkers discovery on biomedical datasets
title_full MEvA-X: a hybrid multiobjective evolutionary tool using an XGBoost classifier for biomarkers discovery on biomedical datasets
title_fullStr MEvA-X: a hybrid multiobjective evolutionary tool using an XGBoost classifier for biomarkers discovery on biomedical datasets
title_full_unstemmed MEvA-X: a hybrid multiobjective evolutionary tool using an XGBoost classifier for biomarkers discovery on biomedical datasets
title_short MEvA-X: a hybrid multiobjective evolutionary tool using an XGBoost classifier for biomarkers discovery on biomedical datasets
title_sort meva-x: a hybrid multiobjective evolutionary tool using an xgboost classifier for biomarkers discovery on biomedical datasets
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10354005/
https://www.ncbi.nlm.nih.gov/pubmed/37326976
http://dx.doi.org/10.1093/bioinformatics/btad384
work_keys_str_mv AT panagiotopouloskonstantinos mevaxahybridmultiobjectiveevolutionarytoolusinganxgboostclassifierforbiomarkersdiscoveryonbiomedicaldatasets
AT korfiatiaigli mevaxahybridmultiobjectiveevolutionarytoolusinganxgboostclassifierforbiomarkersdiscoveryonbiomedicaldatasets
AT theofilatoskonstantinos mevaxahybridmultiobjectiveevolutionarytoolusinganxgboostclassifierforbiomarkersdiscoveryonbiomedicaldatasets
AT hurwitzpeter mevaxahybridmultiobjectiveevolutionarytoolusinganxgboostclassifierforbiomarkersdiscoveryonbiomedicaldatasets
AT deriumarcoagostino mevaxahybridmultiobjectiveevolutionarytoolusinganxgboostclassifierforbiomarkersdiscoveryonbiomedicaldatasets
AT mavroudiseferina mevaxahybridmultiobjectiveevolutionarytoolusinganxgboostclassifierforbiomarkersdiscoveryonbiomedicaldatasets