Cargando…

Blood cancer prediction using leukemia microarray gene data and hybrid logistic vector trees model

Blood cancer has been a growing concern during the last decade and requires early diagnosis to start proper treatment. The diagnosis process is costly and time-consuming involving medical experts and several tests. Thus, an automatic diagnosis system for its accurate prediction is of significant imp...

Descripción completa

Detalles Bibliográficos
Autores principales: Rupapara, Vaibhav, Rustam, Furqan, Aljedaani, Wajdi, Shahzad, Hina Fatima, Lee, Ernesto, Ashraf, Imran
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8770560/
https://www.ncbi.nlm.nih.gov/pubmed/35046459
http://dx.doi.org/10.1038/s41598-022-04835-6
_version_ 1784635400860467200
author Rupapara, Vaibhav
Rustam, Furqan
Aljedaani, Wajdi
Shahzad, Hina Fatima
Lee, Ernesto
Ashraf, Imran
author_facet Rupapara, Vaibhav
Rustam, Furqan
Aljedaani, Wajdi
Shahzad, Hina Fatima
Lee, Ernesto
Ashraf, Imran
author_sort Rupapara, Vaibhav
collection PubMed
description Blood cancer has been a growing concern during the last decade and requires early diagnosis to start proper treatment. The diagnosis process is costly and time-consuming involving medical experts and several tests. Thus, an automatic diagnosis system for its accurate prediction is of significant importance. Diagnosis of blood cancer using leukemia microarray gene data and machine learning approach has become an important medical research today. Despite research efforts, desired accuracy and efficiency necessitate further enhancements. This study proposes an approach for blood cancer disease prediction using the supervised machine learning approach. For the current study, the leukemia microarray gene dataset containing 22,283 genes, is used. ADASYN resampling and Chi-squared (Chi2) features selection techniques are used to resolve imbalanced and high-dimensional dataset problems. ADASYN generates artificial data to make the dataset balanced for each target class, and Chi2 selects the best features out of 22,283 to train learning models. For classification, a hybrid logistics vector trees classifier (LVTrees) is proposed which utilizes logistic regression, support vector classifier, and extra tree classifier. Besides extensive experiments on the datasets, performance comparison with the state-of-the-art methods has been made for determining the significance of the proposed approach. LVTrees outperform all other models with ADASYN and Chi2 techniques with a significant 100% accuracy. Further, a statistical significance T-test is also performed to show the efficacy of the proposed approach. Results using k-fold cross-validation prove the supremacy of the proposed model.
format Online
Article
Text
id pubmed-8770560
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-87705602022-01-20 Blood cancer prediction using leukemia microarray gene data and hybrid logistic vector trees model Rupapara, Vaibhav Rustam, Furqan Aljedaani, Wajdi Shahzad, Hina Fatima Lee, Ernesto Ashraf, Imran Sci Rep Article Blood cancer has been a growing concern during the last decade and requires early diagnosis to start proper treatment. The diagnosis process is costly and time-consuming involving medical experts and several tests. Thus, an automatic diagnosis system for its accurate prediction is of significant importance. Diagnosis of blood cancer using leukemia microarray gene data and machine learning approach has become an important medical research today. Despite research efforts, desired accuracy and efficiency necessitate further enhancements. This study proposes an approach for blood cancer disease prediction using the supervised machine learning approach. For the current study, the leukemia microarray gene dataset containing 22,283 genes, is used. ADASYN resampling and Chi-squared (Chi2) features selection techniques are used to resolve imbalanced and high-dimensional dataset problems. ADASYN generates artificial data to make the dataset balanced for each target class, and Chi2 selects the best features out of 22,283 to train learning models. For classification, a hybrid logistics vector trees classifier (LVTrees) is proposed which utilizes logistic regression, support vector classifier, and extra tree classifier. Besides extensive experiments on the datasets, performance comparison with the state-of-the-art methods has been made for determining the significance of the proposed approach. LVTrees outperform all other models with ADASYN and Chi2 techniques with a significant 100% accuracy. Further, a statistical significance T-test is also performed to show the efficacy of the proposed approach. Results using k-fold cross-validation prove the supremacy of the proposed model. Nature Publishing Group UK 2022-01-19 /pmc/articles/PMC8770560/ /pubmed/35046459 http://dx.doi.org/10.1038/s41598-022-04835-6 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Rupapara, Vaibhav
Rustam, Furqan
Aljedaani, Wajdi
Shahzad, Hina Fatima
Lee, Ernesto
Ashraf, Imran
Blood cancer prediction using leukemia microarray gene data and hybrid logistic vector trees model
title Blood cancer prediction using leukemia microarray gene data and hybrid logistic vector trees model
title_full Blood cancer prediction using leukemia microarray gene data and hybrid logistic vector trees model
title_fullStr Blood cancer prediction using leukemia microarray gene data and hybrid logistic vector trees model
title_full_unstemmed Blood cancer prediction using leukemia microarray gene data and hybrid logistic vector trees model
title_short Blood cancer prediction using leukemia microarray gene data and hybrid logistic vector trees model
title_sort blood cancer prediction using leukemia microarray gene data and hybrid logistic vector trees model
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8770560/
https://www.ncbi.nlm.nih.gov/pubmed/35046459
http://dx.doi.org/10.1038/s41598-022-04835-6
work_keys_str_mv AT rupaparavaibhav bloodcancerpredictionusingleukemiamicroarraygenedataandhybridlogisticvectortreesmodel
AT rustamfurqan bloodcancerpredictionusingleukemiamicroarraygenedataandhybridlogisticvectortreesmodel
AT aljedaaniwajdi bloodcancerpredictionusingleukemiamicroarraygenedataandhybridlogisticvectortreesmodel
AT shahzadhinafatima bloodcancerpredictionusingleukemiamicroarraygenedataandhybridlogisticvectortreesmodel
AT leeernesto bloodcancerpredictionusingleukemiamicroarraygenedataandhybridlogisticvectortreesmodel
AT ashrafimran bloodcancerpredictionusingleukemiamicroarraygenedataandhybridlogisticvectortreesmodel