Cargando…
An efficient clustering algorithm for partitioning Y-short tandem repeats data
BACKGROUND: Y-Short Tandem Repeats (Y-STR) data consist of many similar and almost similar objects. This characteristic of Y-STR data causes two problems with partitioning: non-unique centroids and local minima problems. As a result, the existing partitioning algorithms produce poor clustering resul...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3571976/ https://www.ncbi.nlm.nih.gov/pubmed/23039132 http://dx.doi.org/10.1186/1756-0500-5-557 |
_version_ | 1782259249519263744 |
---|---|
author | Seman, Ali Bakar, Zainab Abu Isa, Mohamed Nizam |
author_facet | Seman, Ali Bakar, Zainab Abu Isa, Mohamed Nizam |
author_sort | Seman, Ali |
collection | PubMed |
description | BACKGROUND: Y-Short Tandem Repeats (Y-STR) data consist of many similar and almost similar objects. This characteristic of Y-STR data causes two problems with partitioning: non-unique centroids and local minima problems. As a result, the existing partitioning algorithms produce poor clustering results. RESULTS: Our new algorithm, called k-Approximate Modal Haplotypes (k-AMH), obtains the highest clustering accuracy scores for five out of six datasets, and produces an equal performance for the remaining dataset. Furthermore, clustering accuracy scores of 100% are achieved for two of the datasets. The k-AMH algorithm records the highest mean accuracy score of 0.93 overall, compared to that of other algorithms: k-Population (0.91), k-Modes-RVF (0.81), New Fuzzy k-Modes (0.80), k-Modes (0.76), k-Modes-Hybrid 1 (0.76), k-Modes-Hybrid 2 (0.75), Fuzzy k-Modes (0.74), and k-Modes-UAVM (0.70). CONCLUSIONS: The partitioning performance of the k-AMH algorithm for Y-STR data is superior to that of other algorithms, owing to its ability to solve the non-unique centroids and local minima problems. Our algorithm is also efficient in terms of time complexity, which is recorded as O(km(n-k)) and considered to be linear. |
format | Online Article Text |
id | pubmed-3571976 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-35719762013-02-20 An efficient clustering algorithm for partitioning Y-short tandem repeats data Seman, Ali Bakar, Zainab Abu Isa, Mohamed Nizam BMC Res Notes Research Article BACKGROUND: Y-Short Tandem Repeats (Y-STR) data consist of many similar and almost similar objects. This characteristic of Y-STR data causes two problems with partitioning: non-unique centroids and local minima problems. As a result, the existing partitioning algorithms produce poor clustering results. RESULTS: Our new algorithm, called k-Approximate Modal Haplotypes (k-AMH), obtains the highest clustering accuracy scores for five out of six datasets, and produces an equal performance for the remaining dataset. Furthermore, clustering accuracy scores of 100% are achieved for two of the datasets. The k-AMH algorithm records the highest mean accuracy score of 0.93 overall, compared to that of other algorithms: k-Population (0.91), k-Modes-RVF (0.81), New Fuzzy k-Modes (0.80), k-Modes (0.76), k-Modes-Hybrid 1 (0.76), k-Modes-Hybrid 2 (0.75), Fuzzy k-Modes (0.74), and k-Modes-UAVM (0.70). CONCLUSIONS: The partitioning performance of the k-AMH algorithm for Y-STR data is superior to that of other algorithms, owing to its ability to solve the non-unique centroids and local minima problems. Our algorithm is also efficient in terms of time complexity, which is recorded as O(km(n-k)) and considered to be linear. BioMed Central 2012-10-06 /pmc/articles/PMC3571976/ /pubmed/23039132 http://dx.doi.org/10.1186/1756-0500-5-557 Text en Copyright ©2012 Seman et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Seman, Ali Bakar, Zainab Abu Isa, Mohamed Nizam An efficient clustering algorithm for partitioning Y-short tandem repeats data |
title | An efficient clustering algorithm for partitioning Y-short tandem repeats data |
title_full | An efficient clustering algorithm for partitioning Y-short tandem repeats data |
title_fullStr | An efficient clustering algorithm for partitioning Y-short tandem repeats data |
title_full_unstemmed | An efficient clustering algorithm for partitioning Y-short tandem repeats data |
title_short | An efficient clustering algorithm for partitioning Y-short tandem repeats data |
title_sort | efficient clustering algorithm for partitioning y-short tandem repeats data |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3571976/ https://www.ncbi.nlm.nih.gov/pubmed/23039132 http://dx.doi.org/10.1186/1756-0500-5-557 |
work_keys_str_mv | AT semanali anefficientclusteringalgorithmforpartitioningyshorttandemrepeatsdata AT bakarzainababu anefficientclusteringalgorithmforpartitioningyshorttandemrepeatsdata AT isamohamednizam anefficientclusteringalgorithmforpartitioningyshorttandemrepeatsdata AT semanali efficientclusteringalgorithmforpartitioningyshorttandemrepeatsdata AT bakarzainababu efficientclusteringalgorithmforpartitioningyshorttandemrepeatsdata AT isamohamednizam efficientclusteringalgorithmforpartitioningyshorttandemrepeatsdata |