Cargando…

An efficient clustering algorithm for partitioning Y-short tandem repeats data

BACKGROUND: Y-Short Tandem Repeats (Y-STR) data consist of many similar and almost similar objects. This characteristic of Y-STR data causes two problems with partitioning: non-unique centroids and local minima problems. As a result, the existing partitioning algorithms produce poor clustering resul...

Descripción completa

Detalles Bibliográficos
Autores principales: Seman, Ali, Bakar, Zainab Abu, Isa, Mohamed Nizam
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3571976/
https://www.ncbi.nlm.nih.gov/pubmed/23039132
http://dx.doi.org/10.1186/1756-0500-5-557
_version_ 1782259249519263744
author Seman, Ali
Bakar, Zainab Abu
Isa, Mohamed Nizam
author_facet Seman, Ali
Bakar, Zainab Abu
Isa, Mohamed Nizam
author_sort Seman, Ali
collection PubMed
description BACKGROUND: Y-Short Tandem Repeats (Y-STR) data consist of many similar and almost similar objects. This characteristic of Y-STR data causes two problems with partitioning: non-unique centroids and local minima problems. As a result, the existing partitioning algorithms produce poor clustering results. RESULTS: Our new algorithm, called k-Approximate Modal Haplotypes (k-AMH), obtains the highest clustering accuracy scores for five out of six datasets, and produces an equal performance for the remaining dataset. Furthermore, clustering accuracy scores of 100% are achieved for two of the datasets. The k-AMH algorithm records the highest mean accuracy score of 0.93 overall, compared to that of other algorithms: k-Population (0.91), k-Modes-RVF (0.81), New Fuzzy k-Modes (0.80), k-Modes (0.76), k-Modes-Hybrid 1 (0.76), k-Modes-Hybrid 2 (0.75), Fuzzy k-Modes (0.74), and k-Modes-UAVM (0.70). CONCLUSIONS: The partitioning performance of the k-AMH algorithm for Y-STR data is superior to that of other algorithms, owing to its ability to solve the non-unique centroids and local minima problems. Our algorithm is also efficient in terms of time complexity, which is recorded as O(km(n-k)) and considered to be linear.
format Online
Article
Text
id pubmed-3571976
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35719762013-02-20 An efficient clustering algorithm for partitioning Y-short tandem repeats data Seman, Ali Bakar, Zainab Abu Isa, Mohamed Nizam BMC Res Notes Research Article BACKGROUND: Y-Short Tandem Repeats (Y-STR) data consist of many similar and almost similar objects. This characteristic of Y-STR data causes two problems with partitioning: non-unique centroids and local minima problems. As a result, the existing partitioning algorithms produce poor clustering results. RESULTS: Our new algorithm, called k-Approximate Modal Haplotypes (k-AMH), obtains the highest clustering accuracy scores for five out of six datasets, and produces an equal performance for the remaining dataset. Furthermore, clustering accuracy scores of 100% are achieved for two of the datasets. The k-AMH algorithm records the highest mean accuracy score of 0.93 overall, compared to that of other algorithms: k-Population (0.91), k-Modes-RVF (0.81), New Fuzzy k-Modes (0.80), k-Modes (0.76), k-Modes-Hybrid 1 (0.76), k-Modes-Hybrid 2 (0.75), Fuzzy k-Modes (0.74), and k-Modes-UAVM (0.70). CONCLUSIONS: The partitioning performance of the k-AMH algorithm for Y-STR data is superior to that of other algorithms, owing to its ability to solve the non-unique centroids and local minima problems. Our algorithm is also efficient in terms of time complexity, which is recorded as O(km(n-k)) and considered to be linear. BioMed Central 2012-10-06 /pmc/articles/PMC3571976/ /pubmed/23039132 http://dx.doi.org/10.1186/1756-0500-5-557 Text en Copyright ©2012 Seman et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Seman, Ali
Bakar, Zainab Abu
Isa, Mohamed Nizam
An efficient clustering algorithm for partitioning Y-short tandem repeats data
title An efficient clustering algorithm for partitioning Y-short tandem repeats data
title_full An efficient clustering algorithm for partitioning Y-short tandem repeats data
title_fullStr An efficient clustering algorithm for partitioning Y-short tandem repeats data
title_full_unstemmed An efficient clustering algorithm for partitioning Y-short tandem repeats data
title_short An efficient clustering algorithm for partitioning Y-short tandem repeats data
title_sort efficient clustering algorithm for partitioning y-short tandem repeats data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3571976/
https://www.ncbi.nlm.nih.gov/pubmed/23039132
http://dx.doi.org/10.1186/1756-0500-5-557
work_keys_str_mv AT semanali anefficientclusteringalgorithmforpartitioningyshorttandemrepeatsdata
AT bakarzainababu anefficientclusteringalgorithmforpartitioningyshorttandemrepeatsdata
AT isamohamednizam anefficientclusteringalgorithmforpartitioningyshorttandemrepeatsdata
AT semanali efficientclusteringalgorithmforpartitioningyshorttandemrepeatsdata
AT bakarzainababu efficientclusteringalgorithmforpartitioningyshorttandemrepeatsdata
AT isamohamednizam efficientclusteringalgorithmforpartitioningyshorttandemrepeatsdata