Cargando…

BiGAMi: Bi-Objective Genetic Algorithm Fitness Function for Feature Selection on Microbiome Datasets

The relationship between the host and the microbiome, or the assemblage of microorganisms (including bacteria, archaea, fungi, and viruses), has been proven crucial for its health and disease development. The high dimensionality of microbiome datasets has often been addressed as a major difficulty f...

Descripción completa

Detalles Bibliográficos
Autores principales: Leske, Mike, Bottacini, Francesca, Afli, Haithem, Andrade, Bruno G. N.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9149982/
https://www.ncbi.nlm.nih.gov/pubmed/35645350
http://dx.doi.org/10.3390/mps5030042
_version_ 1784717324197036032
author Leske, Mike
Bottacini, Francesca
Afli, Haithem
Andrade, Bruno G. N.
author_facet Leske, Mike
Bottacini, Francesca
Afli, Haithem
Andrade, Bruno G. N.
author_sort Leske, Mike
collection PubMed
description The relationship between the host and the microbiome, or the assemblage of microorganisms (including bacteria, archaea, fungi, and viruses), has been proven crucial for its health and disease development. The high dimensionality of microbiome datasets has often been addressed as a major difficulty for data analysis, such as the use of machine-learning (ML) and deep-learning (DL) models. Here, we present BiGAMi, a bi-objective genetic algorithm fitness function for feature selection in microbial datasets to train high-performing phenotype classifiers. The proposed fitness function allowed us to build classifiers that outperformed the baseline performance estimated by the original studies by using as few as 0.04% to 2.32% features of the original dataset. In 35 out of 42 performance comparisons between BiGAMi and other feature selection methods evaluated here (sequential forward selection, SelectKBest, and GARS), BiGAMi achieved its results by selecting 6–93% fewer features. This study showed that the application of a bi-objective GA fitness function against microbiome datasets succeeded in selecting small subsets of bacteria whose contribution to understood diseases and the host state was already experimentally proven. Applying this feature selection approach to novel diseases is expected to quickly reveal the microbes most relevant to a specific condition.
format Online
Article
Text
id pubmed-9149982
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-91499822022-05-31 BiGAMi: Bi-Objective Genetic Algorithm Fitness Function for Feature Selection on Microbiome Datasets Leske, Mike Bottacini, Francesca Afli, Haithem Andrade, Bruno G. N. Methods Protoc Article The relationship between the host and the microbiome, or the assemblage of microorganisms (including bacteria, archaea, fungi, and viruses), has been proven crucial for its health and disease development. The high dimensionality of microbiome datasets has often been addressed as a major difficulty for data analysis, such as the use of machine-learning (ML) and deep-learning (DL) models. Here, we present BiGAMi, a bi-objective genetic algorithm fitness function for feature selection in microbial datasets to train high-performing phenotype classifiers. The proposed fitness function allowed us to build classifiers that outperformed the baseline performance estimated by the original studies by using as few as 0.04% to 2.32% features of the original dataset. In 35 out of 42 performance comparisons between BiGAMi and other feature selection methods evaluated here (sequential forward selection, SelectKBest, and GARS), BiGAMi achieved its results by selecting 6–93% fewer features. This study showed that the application of a bi-objective GA fitness function against microbiome datasets succeeded in selecting small subsets of bacteria whose contribution to understood diseases and the host state was already experimentally proven. Applying this feature selection approach to novel diseases is expected to quickly reveal the microbes most relevant to a specific condition. MDPI 2022-05-23 /pmc/articles/PMC9149982/ /pubmed/35645350 http://dx.doi.org/10.3390/mps5030042 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Leske, Mike
Bottacini, Francesca
Afli, Haithem
Andrade, Bruno G. N.
BiGAMi: Bi-Objective Genetic Algorithm Fitness Function for Feature Selection on Microbiome Datasets
title BiGAMi: Bi-Objective Genetic Algorithm Fitness Function for Feature Selection on Microbiome Datasets
title_full BiGAMi: Bi-Objective Genetic Algorithm Fitness Function for Feature Selection on Microbiome Datasets
title_fullStr BiGAMi: Bi-Objective Genetic Algorithm Fitness Function for Feature Selection on Microbiome Datasets
title_full_unstemmed BiGAMi: Bi-Objective Genetic Algorithm Fitness Function for Feature Selection on Microbiome Datasets
title_short BiGAMi: Bi-Objective Genetic Algorithm Fitness Function for Feature Selection on Microbiome Datasets
title_sort bigami: bi-objective genetic algorithm fitness function for feature selection on microbiome datasets
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9149982/
https://www.ncbi.nlm.nih.gov/pubmed/35645350
http://dx.doi.org/10.3390/mps5030042
work_keys_str_mv AT leskemike bigamibiobjectivegeneticalgorithmfitnessfunctionforfeatureselectiononmicrobiomedatasets
AT bottacinifrancesca bigamibiobjectivegeneticalgorithmfitnessfunctionforfeatureselectiononmicrobiomedatasets
AT aflihaithem bigamibiobjectivegeneticalgorithmfitnessfunctionforfeatureselectiononmicrobiomedatasets
AT andradebrunogn bigamibiobjectivegeneticalgorithmfitnessfunctionforfeatureselectiononmicrobiomedatasets