Cargando…
Recent Experiences in Parameter-Free Data Mining
Recent results supporting the usefulness of the normalized compression distance for the task to classify genome sequences of virus data are reported. Specifically, the problem to cluster the hemagglutinin (HA) sequences of in uenza virus data for the HA gene in dependence on the host and subtype of...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2010
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7121110/ http://dx.doi.org/10.1007/978-90-481-9794-1_68 |
_version_ | 1783515128587616256 |
---|---|
author | Ito, Kimihito Zeugmann, Thomas Zhu, Yu |
author_facet | Ito, Kimihito Zeugmann, Thomas Zhu, Yu |
author_sort | Ito, Kimihito |
collection | PubMed |
description | Recent results supporting the usefulness of the normalized compression distance for the task to classify genome sequences of virus data are reported. Specifically, the problem to cluster the hemagglutinin (HA) sequences of in uenza virus data for the HA gene in dependence on the host and subtype of the virus, and the classification of dengue virus genome data with respect to their four serotypes are studied. A comparison is made with respect to hierarchical and spectral clustering via the kLine algorithm by Fischer and Poland (2004), respectively, and with respect to the standard compressors bzlip, ppmd, and zlib. Our results are very promising and show that one can obtain an (almost) perfect clustering for all the problems studied. |
format | Online Article Text |
id | pubmed-7121110 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2010 |
record_format | MEDLINE/PubMed |
spelling | pubmed-71211102020-04-06 Recent Experiences in Parameter-Free Data Mining Ito, Kimihito Zeugmann, Thomas Zhu, Yu Computer and Information Sciences Article Recent results supporting the usefulness of the normalized compression distance for the task to classify genome sequences of virus data are reported. Specifically, the problem to cluster the hemagglutinin (HA) sequences of in uenza virus data for the HA gene in dependence on the host and subtype of the virus, and the classification of dengue virus genome data with respect to their four serotypes are studied. A comparison is made with respect to hierarchical and spectral clustering via the kLine algorithm by Fischer and Poland (2004), respectively, and with respect to the standard compressors bzlip, ppmd, and zlib. Our results are very promising and show that one can obtain an (almost) perfect clustering for all the problems studied. 2010-06-30 /pmc/articles/PMC7121110/ http://dx.doi.org/10.1007/978-90-481-9794-1_68 Text en © Springer Science+Business Media B.V. 2011 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic. |
spellingShingle | Article Ito, Kimihito Zeugmann, Thomas Zhu, Yu Recent Experiences in Parameter-Free Data Mining |
title | Recent Experiences in Parameter-Free Data Mining |
title_full | Recent Experiences in Parameter-Free Data Mining |
title_fullStr | Recent Experiences in Parameter-Free Data Mining |
title_full_unstemmed | Recent Experiences in Parameter-Free Data Mining |
title_short | Recent Experiences in Parameter-Free Data Mining |
title_sort | recent experiences in parameter-free data mining |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7121110/ http://dx.doi.org/10.1007/978-90-481-9794-1_68 |
work_keys_str_mv | AT itokimihito recentexperiencesinparameterfreedatamining AT zeugmannthomas recentexperiencesinparameterfreedatamining AT zhuyu recentexperiencesinparameterfreedatamining |