Cargando…

Clustering Acoustic Segments Using Multi-Stage Agglomerative Hierarchical Clustering

Agglomerative hierarchical clustering becomes infeasible when applied to large datasets due to its O(N (2)) storage requirements. We present a multi-stage agglomerative hierarchical clustering (MAHC) approach aimed at large datasets of speech segments. The algorithm is based on an iterative divide-a...

Descripción completa

Detalles Bibliográficos
Autores principales: Lerato, Lerato, Niesler, Thomas
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4627777/
https://www.ncbi.nlm.nih.gov/pubmed/26517376
http://dx.doi.org/10.1371/journal.pone.0141756
_version_ 1782398331478081536
author Lerato, Lerato
Niesler, Thomas
author_facet Lerato, Lerato
Niesler, Thomas
author_sort Lerato, Lerato
collection PubMed
description Agglomerative hierarchical clustering becomes infeasible when applied to large datasets due to its O(N (2)) storage requirements. We present a multi-stage agglomerative hierarchical clustering (MAHC) approach aimed at large datasets of speech segments. The algorithm is based on an iterative divide-and-conquer strategy. The data is first split into independent subsets, each of which is clustered separately. Thus reduces the storage required for sequential implementations, and allows concurrent computation on parallel computing hardware. The resultant clusters are merged and subsequently re-divided into subsets, which are passed to the following iteration. We show that MAHC can match and even surpass the performance of the exact implementation when applied to datasets of speech segments.
format Online
Article
Text
id pubmed-4627777
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-46277772015-11-06 Clustering Acoustic Segments Using Multi-Stage Agglomerative Hierarchical Clustering Lerato, Lerato Niesler, Thomas PLoS One Research Article Agglomerative hierarchical clustering becomes infeasible when applied to large datasets due to its O(N (2)) storage requirements. We present a multi-stage agglomerative hierarchical clustering (MAHC) approach aimed at large datasets of speech segments. The algorithm is based on an iterative divide-and-conquer strategy. The data is first split into independent subsets, each of which is clustered separately. Thus reduces the storage required for sequential implementations, and allows concurrent computation on parallel computing hardware. The resultant clusters are merged and subsequently re-divided into subsets, which are passed to the following iteration. We show that MAHC can match and even surpass the performance of the exact implementation when applied to datasets of speech segments. Public Library of Science 2015-10-30 /pmc/articles/PMC4627777/ /pubmed/26517376 http://dx.doi.org/10.1371/journal.pone.0141756 Text en © 2015 Lerato, Niesler http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Lerato, Lerato
Niesler, Thomas
Clustering Acoustic Segments Using Multi-Stage Agglomerative Hierarchical Clustering
title Clustering Acoustic Segments Using Multi-Stage Agglomerative Hierarchical Clustering
title_full Clustering Acoustic Segments Using Multi-Stage Agglomerative Hierarchical Clustering
title_fullStr Clustering Acoustic Segments Using Multi-Stage Agglomerative Hierarchical Clustering
title_full_unstemmed Clustering Acoustic Segments Using Multi-Stage Agglomerative Hierarchical Clustering
title_short Clustering Acoustic Segments Using Multi-Stage Agglomerative Hierarchical Clustering
title_sort clustering acoustic segments using multi-stage agglomerative hierarchical clustering
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4627777/
https://www.ncbi.nlm.nih.gov/pubmed/26517376
http://dx.doi.org/10.1371/journal.pone.0141756
work_keys_str_mv AT leratolerato clusteringacousticsegmentsusingmultistageagglomerativehierarchicalclustering
AT nieslerthomas clusteringacousticsegmentsusingmultistageagglomerativehierarchicalclustering