Cargando…

GrpClassifierEC: a novel classification approach based on the ensemble clustering space

BACKGROUND: Advances in molecular biology have resulted in big and complicated data sets, therefore a clustering approach that able to capture the actual structure and the hidden patterns of the data is required. Moreover, the geometric space may not reflects the actual similarity between the differ...

Descripción completa

Detalles Bibliográficos
Autores principales: Abdallah, Loai, Yousef, Malik
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7017541/
https://www.ncbi.nlm.nih.gov/pubmed/32082410
http://dx.doi.org/10.1186/s13015-020-0162-7
_version_ 1783497216630980608
author Abdallah, Loai
Yousef, Malik
author_facet Abdallah, Loai
Yousef, Malik
author_sort Abdallah, Loai
collection PubMed
description BACKGROUND: Advances in molecular biology have resulted in big and complicated data sets, therefore a clustering approach that able to capture the actual structure and the hidden patterns of the data is required. Moreover, the geometric space may not reflects the actual similarity between the different objects. As a result, in this research we use clustering-based space that convert the geometric space of the molecular to a categorical space based on clustering results. Then we use this space for developing a new classification algorithm. RESULTS: In this study, we propose a new classification method named GrpClassifierEC that replaces the given data space with categorical space based on ensemble clustering (EC). The EC space is defined by tracking the membership of the points over multiple runs of clustering algorithms. Different points that were included in the same clusters will be represented as a single point. Our algorithm classifies all these points as a single class. The similarity between two objects is defined as the number of times that these objects were not belong to the same cluster. In order to evaluate our suggested method, we compare its results to the k nearest neighbors, Decision tree and Random forest classification algorithms on several benchmark datasets. The results confirm that the suggested new algorithm GrpClassifierEC outperforms the other algorithms. CONCLUSIONS: Our algorithm can be integrated with many other algorithms. In this research, we use only the k-means clustering algorithm with different k values. In future research, we propose several directions: (1) checking the effect of the clustering algorithm to build an ensemble clustering space. (2) Finding poor clustering results based on the training data, (3) reducing the volume of the data by combining similar points based on the EC. AVAILABILITY AND IMPLEMENTATION: The KNIME workflow, implementing GrpClassifierEC, is available at https://malikyousef.com
format Online
Article
Text
id pubmed-7017541
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-70175412020-02-20 GrpClassifierEC: a novel classification approach based on the ensemble clustering space Abdallah, Loai Yousef, Malik Algorithms Mol Biol Research BACKGROUND: Advances in molecular biology have resulted in big and complicated data sets, therefore a clustering approach that able to capture the actual structure and the hidden patterns of the data is required. Moreover, the geometric space may not reflects the actual similarity between the different objects. As a result, in this research we use clustering-based space that convert the geometric space of the molecular to a categorical space based on clustering results. Then we use this space for developing a new classification algorithm. RESULTS: In this study, we propose a new classification method named GrpClassifierEC that replaces the given data space with categorical space based on ensemble clustering (EC). The EC space is defined by tracking the membership of the points over multiple runs of clustering algorithms. Different points that were included in the same clusters will be represented as a single point. Our algorithm classifies all these points as a single class. The similarity between two objects is defined as the number of times that these objects were not belong to the same cluster. In order to evaluate our suggested method, we compare its results to the k nearest neighbors, Decision tree and Random forest classification algorithms on several benchmark datasets. The results confirm that the suggested new algorithm GrpClassifierEC outperforms the other algorithms. CONCLUSIONS: Our algorithm can be integrated with many other algorithms. In this research, we use only the k-means clustering algorithm with different k values. In future research, we propose several directions: (1) checking the effect of the clustering algorithm to build an ensemble clustering space. (2) Finding poor clustering results based on the training data, (3) reducing the volume of the data by combining similar points based on the EC. AVAILABILITY AND IMPLEMENTATION: The KNIME workflow, implementing GrpClassifierEC, is available at https://malikyousef.com BioMed Central 2020-02-13 /pmc/articles/PMC7017541/ /pubmed/32082410 http://dx.doi.org/10.1186/s13015-020-0162-7 Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Abdallah, Loai
Yousef, Malik
GrpClassifierEC: a novel classification approach based on the ensemble clustering space
title GrpClassifierEC: a novel classification approach based on the ensemble clustering space
title_full GrpClassifierEC: a novel classification approach based on the ensemble clustering space
title_fullStr GrpClassifierEC: a novel classification approach based on the ensemble clustering space
title_full_unstemmed GrpClassifierEC: a novel classification approach based on the ensemble clustering space
title_short GrpClassifierEC: a novel classification approach based on the ensemble clustering space
title_sort grpclassifierec: a novel classification approach based on the ensemble clustering space
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7017541/
https://www.ncbi.nlm.nih.gov/pubmed/32082410
http://dx.doi.org/10.1186/s13015-020-0162-7
work_keys_str_mv AT abdallahloai grpclassifierecanovelclassificationapproachbasedontheensembleclusteringspace
AT yousefmalik grpclassifierecanovelclassificationapproachbasedontheensembleclusteringspace