Cargando…

Extreme Gradient Boosting Tuned with Metaheuristic Algorithms for Predicting Myeloid NGS Onco-Somatic Variant Pathogenicity

The advent of next-generation sequencing (NGS) technologies has revolutionized the field of bioinformatics and genomics, particularly in the area of onco-somatic genetics. NGS has provided a wealth of information about the genetic changes that underlie cancer and has considerably improved our abilit...

Descripción completa

Detalles Bibliográficos
Autores principales: Pellegrino, Eric, Camilla, Clara, Abbou, Norman, Beaufils, Nathalie, Pissier, Christel, Gabert, Jean, Nanni-Metellus, Isabelle, Ouafik, L’Houcine
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10376905/
https://www.ncbi.nlm.nih.gov/pubmed/37508780
http://dx.doi.org/10.3390/bioengineering10070753
_version_ 1785079388757295104
author Pellegrino, Eric
Camilla, Clara
Abbou, Norman
Beaufils, Nathalie
Pissier, Christel
Gabert, Jean
Nanni-Metellus, Isabelle
Ouafik, L’Houcine
author_facet Pellegrino, Eric
Camilla, Clara
Abbou, Norman
Beaufils, Nathalie
Pissier, Christel
Gabert, Jean
Nanni-Metellus, Isabelle
Ouafik, L’Houcine
author_sort Pellegrino, Eric
collection PubMed
description The advent of next-generation sequencing (NGS) technologies has revolutionized the field of bioinformatics and genomics, particularly in the area of onco-somatic genetics. NGS has provided a wealth of information about the genetic changes that underlie cancer and has considerably improved our ability to diagnose and treat cancer. However, the large amount of data generated by NGS makes it difficult to interpret the variants. To address this, machine learning algorithms such as Extreme Gradient Boosting (XGBoost) have become increasingly important tools in the analysis of NGS data. In this paper, we present a machine learning tool that uses XGBoost to predict the pathogenicity of a mutation in the myeloid panel. We optimized the performance of XGBoost using metaheuristic algorithms and compared our predictions with the decisions of biologists and other prediction tools. The myeloid panel is a critical component in the diagnosis and treatment of myeloid neoplasms, and the sequencing of this panel allows for the identification of specific genetic mutations, enabling more accurate diagnoses and tailored treatment plans. We used datasets collected from our myeloid panel NGS analysis to train the XGBoost algorithm. It represents a data collection of 15,977 mutations variants composed of a collection of 13,221 Single Nucleotide Variants (SNVs), 73 Multiple Nucleoid Variants (MNVs), and 2683 insertion deletions (INDELs). The optimal XGBoost hyperparameters were found with Differential Evolution (DE), with an accuracy of 99.35%, precision of 98.70%, specificity of 98.71%, and sensitivity of 1.
format Online
Article
Text
id pubmed-10376905
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-103769052023-07-29 Extreme Gradient Boosting Tuned with Metaheuristic Algorithms for Predicting Myeloid NGS Onco-Somatic Variant Pathogenicity Pellegrino, Eric Camilla, Clara Abbou, Norman Beaufils, Nathalie Pissier, Christel Gabert, Jean Nanni-Metellus, Isabelle Ouafik, L’Houcine Bioengineering (Basel) Article The advent of next-generation sequencing (NGS) technologies has revolutionized the field of bioinformatics and genomics, particularly in the area of onco-somatic genetics. NGS has provided a wealth of information about the genetic changes that underlie cancer and has considerably improved our ability to diagnose and treat cancer. However, the large amount of data generated by NGS makes it difficult to interpret the variants. To address this, machine learning algorithms such as Extreme Gradient Boosting (XGBoost) have become increasingly important tools in the analysis of NGS data. In this paper, we present a machine learning tool that uses XGBoost to predict the pathogenicity of a mutation in the myeloid panel. We optimized the performance of XGBoost using metaheuristic algorithms and compared our predictions with the decisions of biologists and other prediction tools. The myeloid panel is a critical component in the diagnosis and treatment of myeloid neoplasms, and the sequencing of this panel allows for the identification of specific genetic mutations, enabling more accurate diagnoses and tailored treatment plans. We used datasets collected from our myeloid panel NGS analysis to train the XGBoost algorithm. It represents a data collection of 15,977 mutations variants composed of a collection of 13,221 Single Nucleotide Variants (SNVs), 73 Multiple Nucleoid Variants (MNVs), and 2683 insertion deletions (INDELs). The optimal XGBoost hyperparameters were found with Differential Evolution (DE), with an accuracy of 99.35%, precision of 98.70%, specificity of 98.71%, and sensitivity of 1. MDPI 2023-06-23 /pmc/articles/PMC10376905/ /pubmed/37508780 http://dx.doi.org/10.3390/bioengineering10070753 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Pellegrino, Eric
Camilla, Clara
Abbou, Norman
Beaufils, Nathalie
Pissier, Christel
Gabert, Jean
Nanni-Metellus, Isabelle
Ouafik, L’Houcine
Extreme Gradient Boosting Tuned with Metaheuristic Algorithms for Predicting Myeloid NGS Onco-Somatic Variant Pathogenicity
title Extreme Gradient Boosting Tuned with Metaheuristic Algorithms for Predicting Myeloid NGS Onco-Somatic Variant Pathogenicity
title_full Extreme Gradient Boosting Tuned with Metaheuristic Algorithms for Predicting Myeloid NGS Onco-Somatic Variant Pathogenicity
title_fullStr Extreme Gradient Boosting Tuned with Metaheuristic Algorithms for Predicting Myeloid NGS Onco-Somatic Variant Pathogenicity
title_full_unstemmed Extreme Gradient Boosting Tuned with Metaheuristic Algorithms for Predicting Myeloid NGS Onco-Somatic Variant Pathogenicity
title_short Extreme Gradient Boosting Tuned with Metaheuristic Algorithms for Predicting Myeloid NGS Onco-Somatic Variant Pathogenicity
title_sort extreme gradient boosting tuned with metaheuristic algorithms for predicting myeloid ngs onco-somatic variant pathogenicity
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10376905/
https://www.ncbi.nlm.nih.gov/pubmed/37508780
http://dx.doi.org/10.3390/bioengineering10070753
work_keys_str_mv AT pellegrinoeric extremegradientboostingtunedwithmetaheuristicalgorithmsforpredictingmyeloidngsoncosomaticvariantpathogenicity
AT camillaclara extremegradientboostingtunedwithmetaheuristicalgorithmsforpredictingmyeloidngsoncosomaticvariantpathogenicity
AT abbounorman extremegradientboostingtunedwithmetaheuristicalgorithmsforpredictingmyeloidngsoncosomaticvariantpathogenicity
AT beaufilsnathalie extremegradientboostingtunedwithmetaheuristicalgorithmsforpredictingmyeloidngsoncosomaticvariantpathogenicity
AT pissierchristel extremegradientboostingtunedwithmetaheuristicalgorithmsforpredictingmyeloidngsoncosomaticvariantpathogenicity
AT gabertjean extremegradientboostingtunedwithmetaheuristicalgorithmsforpredictingmyeloidngsoncosomaticvariantpathogenicity
AT nannimetellusisabelle extremegradientboostingtunedwithmetaheuristicalgorithmsforpredictingmyeloidngsoncosomaticvariantpathogenicity
AT ouafiklhoucine extremegradientboostingtunedwithmetaheuristicalgorithmsforpredictingmyeloidngsoncosomaticvariantpathogenicity