Cargando…
GrpClassifierEC: a novel classification approach based on the ensemble clustering space
BACKGROUND: Advances in molecular biology have resulted in big and complicated data sets, therefore a clustering approach that able to capture the actual structure and the hidden patterns of the data is required. Moreover, the geometric space may not reflects the actual similarity between the differ...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7017541/ https://www.ncbi.nlm.nih.gov/pubmed/32082410 http://dx.doi.org/10.1186/s13015-020-0162-7 |
_version_ | 1783497216630980608 |
---|---|
author | Abdallah, Loai Yousef, Malik |
author_facet | Abdallah, Loai Yousef, Malik |
author_sort | Abdallah, Loai |
collection | PubMed |
description | BACKGROUND: Advances in molecular biology have resulted in big and complicated data sets, therefore a clustering approach that able to capture the actual structure and the hidden patterns of the data is required. Moreover, the geometric space may not reflects the actual similarity between the different objects. As a result, in this research we use clustering-based space that convert the geometric space of the molecular to a categorical space based on clustering results. Then we use this space for developing a new classification algorithm. RESULTS: In this study, we propose a new classification method named GrpClassifierEC that replaces the given data space with categorical space based on ensemble clustering (EC). The EC space is defined by tracking the membership of the points over multiple runs of clustering algorithms. Different points that were included in the same clusters will be represented as a single point. Our algorithm classifies all these points as a single class. The similarity between two objects is defined as the number of times that these objects were not belong to the same cluster. In order to evaluate our suggested method, we compare its results to the k nearest neighbors, Decision tree and Random forest classification algorithms on several benchmark datasets. The results confirm that the suggested new algorithm GrpClassifierEC outperforms the other algorithms. CONCLUSIONS: Our algorithm can be integrated with many other algorithms. In this research, we use only the k-means clustering algorithm with different k values. In future research, we propose several directions: (1) checking the effect of the clustering algorithm to build an ensemble clustering space. (2) Finding poor clustering results based on the training data, (3) reducing the volume of the data by combining similar points based on the EC. AVAILABILITY AND IMPLEMENTATION: The KNIME workflow, implementing GrpClassifierEC, is available at https://malikyousef.com |
format | Online Article Text |
id | pubmed-7017541 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-70175412020-02-20 GrpClassifierEC: a novel classification approach based on the ensemble clustering space Abdallah, Loai Yousef, Malik Algorithms Mol Biol Research BACKGROUND: Advances in molecular biology have resulted in big and complicated data sets, therefore a clustering approach that able to capture the actual structure and the hidden patterns of the data is required. Moreover, the geometric space may not reflects the actual similarity between the different objects. As a result, in this research we use clustering-based space that convert the geometric space of the molecular to a categorical space based on clustering results. Then we use this space for developing a new classification algorithm. RESULTS: In this study, we propose a new classification method named GrpClassifierEC that replaces the given data space with categorical space based on ensemble clustering (EC). The EC space is defined by tracking the membership of the points over multiple runs of clustering algorithms. Different points that were included in the same clusters will be represented as a single point. Our algorithm classifies all these points as a single class. The similarity between two objects is defined as the number of times that these objects were not belong to the same cluster. In order to evaluate our suggested method, we compare its results to the k nearest neighbors, Decision tree and Random forest classification algorithms on several benchmark datasets. The results confirm that the suggested new algorithm GrpClassifierEC outperforms the other algorithms. CONCLUSIONS: Our algorithm can be integrated with many other algorithms. In this research, we use only the k-means clustering algorithm with different k values. In future research, we propose several directions: (1) checking the effect of the clustering algorithm to build an ensemble clustering space. (2) Finding poor clustering results based on the training data, (3) reducing the volume of the data by combining similar points based on the EC. AVAILABILITY AND IMPLEMENTATION: The KNIME workflow, implementing GrpClassifierEC, is available at https://malikyousef.com BioMed Central 2020-02-13 /pmc/articles/PMC7017541/ /pubmed/32082410 http://dx.doi.org/10.1186/s13015-020-0162-7 Text en © The Author(s) 2020 Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Abdallah, Loai Yousef, Malik GrpClassifierEC: a novel classification approach based on the ensemble clustering space |
title | GrpClassifierEC: a novel classification approach based on the ensemble clustering space |
title_full | GrpClassifierEC: a novel classification approach based on the ensemble clustering space |
title_fullStr | GrpClassifierEC: a novel classification approach based on the ensemble clustering space |
title_full_unstemmed | GrpClassifierEC: a novel classification approach based on the ensemble clustering space |
title_short | GrpClassifierEC: a novel classification approach based on the ensemble clustering space |
title_sort | grpclassifierec: a novel classification approach based on the ensemble clustering space |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7017541/ https://www.ncbi.nlm.nih.gov/pubmed/32082410 http://dx.doi.org/10.1186/s13015-020-0162-7 |
work_keys_str_mv | AT abdallahloai grpclassifierecanovelclassificationapproachbasedontheensembleclusteringspace AT yousefmalik grpclassifierecanovelclassificationapproachbasedontheensembleclusteringspace |