Cargando…

An ensemble machine learning model based on multiple filtering and supervised attribute clustering algorithm for classifying cancer samples

BACKGROUND: Machine learning is one kind of machine intelligence technique that learns from data and detects inherent patterns from large, complex datasets. Due to this capability, machine learning techniques are widely used in medical applications, especially where large-scale genomic and proteomic...

Descripción completa

Detalles Bibliográficos
Autores principales: Bose, Shilpi, Das, Chandra, Banerjee, Abhik, Ghosh, Kuntal, Chattopadhyay, Matangini, Chattopadhyay, Samiran, Barik, Aishwarya
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8459790/
https://www.ncbi.nlm.nih.gov/pubmed/34616883
http://dx.doi.org/10.7717/peerj-cs.671
_version_ 1784571601053810688
author Bose, Shilpi
Das, Chandra
Banerjee, Abhik
Ghosh, Kuntal
Chattopadhyay, Matangini
Chattopadhyay, Samiran
Barik, Aishwarya
author_facet Bose, Shilpi
Das, Chandra
Banerjee, Abhik
Ghosh, Kuntal
Chattopadhyay, Matangini
Chattopadhyay, Samiran
Barik, Aishwarya
author_sort Bose, Shilpi
collection PubMed
description BACKGROUND: Machine learning is one kind of machine intelligence technique that learns from data and detects inherent patterns from large, complex datasets. Due to this capability, machine learning techniques are widely used in medical applications, especially where large-scale genomic and proteomic data are used. Cancer classification based on bio-molecular profiling data is a very important topic for medical applications since it improves the diagnostic accuracy of cancer and enables a successful culmination of cancer treatments. Hence, machine learning techniques are widely used in cancer detection and prognosis. METHODS: In this article, a new ensemble machine learning classification model named Multiple Filtering and Supervised Attribute Clustering algorithm based Ensemble Classification model (MFSAC-EC) is proposed which can handle class imbalance problem and high dimensionality of microarray datasets. This model first generates a number of bootstrapped datasets from the original training data where the oversampling procedure is applied to handle the class imbalance problem. The proposed MFSAC method is then applied to each of these bootstrapped datasets to generate sub-datasets, each of which contains a subset of the most relevant/informative attributes of the original dataset. The MFSAC method is a feature selection technique combining multiple filters with a new supervised attribute clustering algorithm. Then for every sub-dataset, a base classifier is constructed separately, and finally, the predictive accuracy of these base classifiers is combined using the majority voting technique forming the MFSAC-based ensemble classifier. Also, a number of most informative attributes are selected as important features based on their frequency of occurrence in these sub-datasets. RESULTS: To assess the performance of the proposed MFSAC-EC model, it is applied on different high-dimensional microarray gene expression datasets for cancer sample classification. The proposed model is compared with well-known existing models to establish its effectiveness with respect to other models. From the experimental results, it has been found that the generalization performance/testing accuracy of the proposed classifier is significantly better compared to other well-known existing models. Apart from that, it has been also found that the proposed model can identify many important attributes/biomarker genes.
format Online
Article
Text
id pubmed-8459790
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-84597902021-10-05 An ensemble machine learning model based on multiple filtering and supervised attribute clustering algorithm for classifying cancer samples Bose, Shilpi Das, Chandra Banerjee, Abhik Ghosh, Kuntal Chattopadhyay, Matangini Chattopadhyay, Samiran Barik, Aishwarya PeerJ Comput Sci Bioinformatics BACKGROUND: Machine learning is one kind of machine intelligence technique that learns from data and detects inherent patterns from large, complex datasets. Due to this capability, machine learning techniques are widely used in medical applications, especially where large-scale genomic and proteomic data are used. Cancer classification based on bio-molecular profiling data is a very important topic for medical applications since it improves the diagnostic accuracy of cancer and enables a successful culmination of cancer treatments. Hence, machine learning techniques are widely used in cancer detection and prognosis. METHODS: In this article, a new ensemble machine learning classification model named Multiple Filtering and Supervised Attribute Clustering algorithm based Ensemble Classification model (MFSAC-EC) is proposed which can handle class imbalance problem and high dimensionality of microarray datasets. This model first generates a number of bootstrapped datasets from the original training data where the oversampling procedure is applied to handle the class imbalance problem. The proposed MFSAC method is then applied to each of these bootstrapped datasets to generate sub-datasets, each of which contains a subset of the most relevant/informative attributes of the original dataset. The MFSAC method is a feature selection technique combining multiple filters with a new supervised attribute clustering algorithm. Then for every sub-dataset, a base classifier is constructed separately, and finally, the predictive accuracy of these base classifiers is combined using the majority voting technique forming the MFSAC-based ensemble classifier. Also, a number of most informative attributes are selected as important features based on their frequency of occurrence in these sub-datasets. RESULTS: To assess the performance of the proposed MFSAC-EC model, it is applied on different high-dimensional microarray gene expression datasets for cancer sample classification. The proposed model is compared with well-known existing models to establish its effectiveness with respect to other models. From the experimental results, it has been found that the generalization performance/testing accuracy of the proposed classifier is significantly better compared to other well-known existing models. Apart from that, it has been also found that the proposed model can identify many important attributes/biomarker genes. PeerJ Inc. 2021-09-16 /pmc/articles/PMC8459790/ /pubmed/34616883 http://dx.doi.org/10.7717/peerj-cs.671 Text en © 2021 Bose et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, reproduction and adaptation in any medium and for any purpose provided that it is properly attributed. For attribution, the original author(s), title, publication source (PeerJ Computer Science) and either DOI or URL of the article must be cited.
spellingShingle Bioinformatics
Bose, Shilpi
Das, Chandra
Banerjee, Abhik
Ghosh, Kuntal
Chattopadhyay, Matangini
Chattopadhyay, Samiran
Barik, Aishwarya
An ensemble machine learning model based on multiple filtering and supervised attribute clustering algorithm for classifying cancer samples
title An ensemble machine learning model based on multiple filtering and supervised attribute clustering algorithm for classifying cancer samples
title_full An ensemble machine learning model based on multiple filtering and supervised attribute clustering algorithm for classifying cancer samples
title_fullStr An ensemble machine learning model based on multiple filtering and supervised attribute clustering algorithm for classifying cancer samples
title_full_unstemmed An ensemble machine learning model based on multiple filtering and supervised attribute clustering algorithm for classifying cancer samples
title_short An ensemble machine learning model based on multiple filtering and supervised attribute clustering algorithm for classifying cancer samples
title_sort ensemble machine learning model based on multiple filtering and supervised attribute clustering algorithm for classifying cancer samples
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8459790/
https://www.ncbi.nlm.nih.gov/pubmed/34616883
http://dx.doi.org/10.7717/peerj-cs.671
work_keys_str_mv AT boseshilpi anensemblemachinelearningmodelbasedonmultiplefilteringandsupervisedattributeclusteringalgorithmforclassifyingcancersamples
AT daschandra anensemblemachinelearningmodelbasedonmultiplefilteringandsupervisedattributeclusteringalgorithmforclassifyingcancersamples
AT banerjeeabhik anensemblemachinelearningmodelbasedonmultiplefilteringandsupervisedattributeclusteringalgorithmforclassifyingcancersamples
AT ghoshkuntal anensemblemachinelearningmodelbasedonmultiplefilteringandsupervisedattributeclusteringalgorithmforclassifyingcancersamples
AT chattopadhyaymatangini anensemblemachinelearningmodelbasedonmultiplefilteringandsupervisedattributeclusteringalgorithmforclassifyingcancersamples
AT chattopadhyaysamiran anensemblemachinelearningmodelbasedonmultiplefilteringandsupervisedattributeclusteringalgorithmforclassifyingcancersamples
AT barikaishwarya anensemblemachinelearningmodelbasedonmultiplefilteringandsupervisedattributeclusteringalgorithmforclassifyingcancersamples
AT boseshilpi ensemblemachinelearningmodelbasedonmultiplefilteringandsupervisedattributeclusteringalgorithmforclassifyingcancersamples
AT daschandra ensemblemachinelearningmodelbasedonmultiplefilteringandsupervisedattributeclusteringalgorithmforclassifyingcancersamples
AT banerjeeabhik ensemblemachinelearningmodelbasedonmultiplefilteringandsupervisedattributeclusteringalgorithmforclassifyingcancersamples
AT ghoshkuntal ensemblemachinelearningmodelbasedonmultiplefilteringandsupervisedattributeclusteringalgorithmforclassifyingcancersamples
AT chattopadhyaymatangini ensemblemachinelearningmodelbasedonmultiplefilteringandsupervisedattributeclusteringalgorithmforclassifyingcancersamples
AT chattopadhyaysamiran ensemblemachinelearningmodelbasedonmultiplefilteringandsupervisedattributeclusteringalgorithmforclassifyingcancersamples
AT barikaishwarya ensemblemachinelearningmodelbasedonmultiplefilteringandsupervisedattributeclusteringalgorithmforclassifyingcancersamples