Cargando…
Clustering the Normalized Compression Distance for Influenza Virus Data
The present paper analyzes the usefulness of the normalized compression distance for the problem to cluster the hemagglutinin (HA) sequences of influenza virus data for the HA gene in dependence on the available compressors. Using the CompLearn Toolkit, the built-in compressors zlib and bzip2 are co...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2010
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7121547/ http://dx.doi.org/10.1007/978-3-642-12476-1_9 |
_version_ | 1783515226699726848 |
---|---|
author | Ito, Kimihito Zeugmann, Thomas Zhu, Yu |
author_facet | Ito, Kimihito Zeugmann, Thomas Zhu, Yu |
author_sort | Ito, Kimihito |
collection | PubMed |
description | The present paper analyzes the usefulness of the normalized compression distance for the problem to cluster the hemagglutinin (HA) sequences of influenza virus data for the HA gene in dependence on the available compressors. Using the CompLearn Toolkit, the built-in compressors zlib and bzip2 are compared. Moreover, a comparison is made with respect to hierarchical and spectral clustering. For the hierarchical clustering, hclust from the R package is used, and the spectral clustering is done via the kLine algorithm proposed by Fischer and Poland (2004). Our results are very promising and show that one can obtain an (almost) perfect clustering. It turned out that the zlib compressor allowed for better results than the bzip2 compressor and, if all data are concerned, then hierarchical clustering is a bit better than spectral clustering via kLines. |
format | Online Article Text |
id | pubmed-7121547 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2010 |
record_format | MEDLINE/PubMed |
spelling | pubmed-71215472020-04-06 Clustering the Normalized Compression Distance for Influenza Virus Data Ito, Kimihito Zeugmann, Thomas Zhu, Yu Algorithms and Applications Article The present paper analyzes the usefulness of the normalized compression distance for the problem to cluster the hemagglutinin (HA) sequences of influenza virus data for the HA gene in dependence on the available compressors. Using the CompLearn Toolkit, the built-in compressors zlib and bzip2 are compared. Moreover, a comparison is made with respect to hierarchical and spectral clustering. For the hierarchical clustering, hclust from the R package is used, and the spectral clustering is done via the kLine algorithm proposed by Fischer and Poland (2004). Our results are very promising and show that one can obtain an (almost) perfect clustering. It turned out that the zlib compressor allowed for better results than the bzip2 compressor and, if all data are concerned, then hierarchical clustering is a bit better than spectral clustering via kLines. 2010 /pmc/articles/PMC7121547/ http://dx.doi.org/10.1007/978-3-642-12476-1_9 Text en © Springer-Verlag Berlin Heidelberg 2010 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic. |
spellingShingle | Article Ito, Kimihito Zeugmann, Thomas Zhu, Yu Clustering the Normalized Compression Distance for Influenza Virus Data |
title | Clustering the Normalized Compression Distance for Influenza Virus Data |
title_full | Clustering the Normalized Compression Distance for Influenza Virus Data |
title_fullStr | Clustering the Normalized Compression Distance for Influenza Virus Data |
title_full_unstemmed | Clustering the Normalized Compression Distance for Influenza Virus Data |
title_short | Clustering the Normalized Compression Distance for Influenza Virus Data |
title_sort | clustering the normalized compression distance for influenza virus data |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7121547/ http://dx.doi.org/10.1007/978-3-642-12476-1_9 |
work_keys_str_mv | AT itokimihito clusteringthenormalizedcompressiondistanceforinfluenzavirusdata AT zeugmannthomas clusteringthenormalizedcompressiondistanceforinfluenzavirusdata AT zhuyu clusteringthenormalizedcompressiondistanceforinfluenzavirusdata |