Cargando…

Diagnosis of acute myeloid leukaemia on microarray gene expression data using categorical gradient boosted trees

We define an iterative method for dimensionality reduction using categorical gradient boosted trees and Shapley values and created four machine learning models which potentially could be used as diagnostic tests for acute myeloid leukaemia (AML). For the final Catboost model we use a dataset of 2177...

Descripción completa

Detalles Bibliográficos
Autores principales: Angelakis, Athanasios, Soulioti, Ioanna, Filippakis, Michael
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10582309/
https://www.ncbi.nlm.nih.gov/pubmed/37860531
http://dx.doi.org/10.1016/j.heliyon.2023.e20530
_version_ 1785122301262430208
author Angelakis, Athanasios
Soulioti, Ioanna
Filippakis, Michael
author_facet Angelakis, Athanasios
Soulioti, Ioanna
Filippakis, Michael
author_sort Angelakis, Athanasios
collection PubMed
description We define an iterative method for dimensionality reduction using categorical gradient boosted trees and Shapley values and created four machine learning models which potentially could be used as diagnostic tests for acute myeloid leukaemia (AML). For the final Catboost model we use a dataset of 2177 individuals using as features 16 probe sets and the age in order to classify if someone has AML or is healthy. The dataset is multicentric and consists of data from 27 organizations, 25 cities, 15 countries and 4 continents. The performance of our last model is specificity: 0.9909, sensitivity: 0.9985, F1-score: 0.9976 and its ROC-AUC: 0.9962 using ten fold cross validation. On an inference dataset the perormance is: specificity: 0.9909, sensitivity: 0.9969, F1-score: 0.9969 and its ROC-AUC: 0.9939. To the best of our knowledge the performance of our model is the best one in the literature, as regards the diagnosis of AML using similar or not data. Moreover, there has not been any bibliographic reference which associates AML or any other type of cancer with the 16 probe sets we used as features in our final model.
format Online
Article
Text
id pubmed-10582309
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-105823092023-10-19 Diagnosis of acute myeloid leukaemia on microarray gene expression data using categorical gradient boosted trees Angelakis, Athanasios Soulioti, Ioanna Filippakis, Michael Heliyon Research Article We define an iterative method for dimensionality reduction using categorical gradient boosted trees and Shapley values and created four machine learning models which potentially could be used as diagnostic tests for acute myeloid leukaemia (AML). For the final Catboost model we use a dataset of 2177 individuals using as features 16 probe sets and the age in order to classify if someone has AML or is healthy. The dataset is multicentric and consists of data from 27 organizations, 25 cities, 15 countries and 4 continents. The performance of our last model is specificity: 0.9909, sensitivity: 0.9985, F1-score: 0.9976 and its ROC-AUC: 0.9962 using ten fold cross validation. On an inference dataset the perormance is: specificity: 0.9909, sensitivity: 0.9969, F1-score: 0.9969 and its ROC-AUC: 0.9939. To the best of our knowledge the performance of our model is the best one in the literature, as regards the diagnosis of AML using similar or not data. Moreover, there has not been any bibliographic reference which associates AML or any other type of cancer with the 16 probe sets we used as features in our final model. Elsevier 2023-10-04 /pmc/articles/PMC10582309/ /pubmed/37860531 http://dx.doi.org/10.1016/j.heliyon.2023.e20530 Text en © 2023 The Author(s) https://creativecommons.org/licenses/by-nc-nd/4.0/This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Research Article
Angelakis, Athanasios
Soulioti, Ioanna
Filippakis, Michael
Diagnosis of acute myeloid leukaemia on microarray gene expression data using categorical gradient boosted trees
title Diagnosis of acute myeloid leukaemia on microarray gene expression data using categorical gradient boosted trees
title_full Diagnosis of acute myeloid leukaemia on microarray gene expression data using categorical gradient boosted trees
title_fullStr Diagnosis of acute myeloid leukaemia on microarray gene expression data using categorical gradient boosted trees
title_full_unstemmed Diagnosis of acute myeloid leukaemia on microarray gene expression data using categorical gradient boosted trees
title_short Diagnosis of acute myeloid leukaemia on microarray gene expression data using categorical gradient boosted trees
title_sort diagnosis of acute myeloid leukaemia on microarray gene expression data using categorical gradient boosted trees
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10582309/
https://www.ncbi.nlm.nih.gov/pubmed/37860531
http://dx.doi.org/10.1016/j.heliyon.2023.e20530
work_keys_str_mv AT angelakisathanasios diagnosisofacutemyeloidleukaemiaonmicroarraygeneexpressiondatausingcategoricalgradientboostedtrees
AT souliotiioanna diagnosisofacutemyeloidleukaemiaonmicroarraygeneexpressiondatausingcategoricalgradientboostedtrees
AT filippakismichael diagnosisofacutemyeloidleukaemiaonmicroarraygeneexpressiondatausingcategoricalgradientboostedtrees