Cargando…

Density-based hierarchical clustering of pyro-sequences on a large scale—the case of fungal ITS1

Motivation: Analysis of millions of pyro-sequences is currently playing a crucial role in the advance of environmental microbiology. Taxonomy-independent, i.e. unsupervised, clustering of these sequences is essential for the definition of Operational Taxonomic Units. For this application, reproducib...

Descripción completa

Detalles Bibliográficos
Autores principales: Pagni, Marco, Niculita-Hirzel, Hélène, Pellissier, Loïc, Dubuis, Anne, Xenarios, Ioannis, Guisan, Antoine, Sanders, Ian R., Goudet, Jérôme, Guex, Nicolas
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3654712/
https://www.ncbi.nlm.nih.gov/pubmed/23539304
http://dx.doi.org/10.1093/bioinformatics/btt149
_version_ 1782476065593098240
author Pagni, Marco
Niculita-Hirzel, Hélène
Pellissier, Loïc
Dubuis, Anne
Xenarios, Ioannis
Guisan, Antoine
Sanders, Ian R.
Goudet, Jérôme
Guex, Nicolas
author_facet Pagni, Marco
Niculita-Hirzel, Hélène
Pellissier, Loïc
Dubuis, Anne
Xenarios, Ioannis
Guisan, Antoine
Sanders, Ian R.
Goudet, Jérôme
Guex, Nicolas
author_sort Pagni, Marco
collection PubMed
description Motivation: Analysis of millions of pyro-sequences is currently playing a crucial role in the advance of environmental microbiology. Taxonomy-independent, i.e. unsupervised, clustering of these sequences is essential for the definition of Operational Taxonomic Units. For this application, reproducibility and robustness should be the most sought after qualities, but have thus far largely been overlooked. Results: More than 1 million hyper-variable internal transcribed spacer 1 (ITS1) sequences of fungal origin have been analyzed. The ITS1 sequences were first properly extracted from 454 reads using generalized profiles. Then, otupipe, cd-hit-454, ESPRIT-Tree and DBC454, a new algorithm presented here, were used to analyze the sequences. A numerical assay was developed to measure the reproducibility and robustness of these algorithms. DBC454 was the most robust, closely followed by ESPRIT-Tree. DBC454 features density-based hierarchical clustering, which complements the other methods by providing insights into the structure of the data. Availability: An executable is freely available for non-commercial users at ftp://ftp.vital-it.ch/tools/dbc454. It is designed to run under MPI on a cluster of 64-bit Linux machines running Red Hat 4.x, or on a multi-core OSX system. Contact: dbc454@vital-it.ch or nicolas.guex@isb-sib.ch
format Online
Article
Text
id pubmed-3654712
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-36547122013-05-17 Density-based hierarchical clustering of pyro-sequences on a large scale—the case of fungal ITS1 Pagni, Marco Niculita-Hirzel, Hélène Pellissier, Loïc Dubuis, Anne Xenarios, Ioannis Guisan, Antoine Sanders, Ian R. Goudet, Jérôme Guex, Nicolas Bioinformatics Original Papers Motivation: Analysis of millions of pyro-sequences is currently playing a crucial role in the advance of environmental microbiology. Taxonomy-independent, i.e. unsupervised, clustering of these sequences is essential for the definition of Operational Taxonomic Units. For this application, reproducibility and robustness should be the most sought after qualities, but have thus far largely been overlooked. Results: More than 1 million hyper-variable internal transcribed spacer 1 (ITS1) sequences of fungal origin have been analyzed. The ITS1 sequences were first properly extracted from 454 reads using generalized profiles. Then, otupipe, cd-hit-454, ESPRIT-Tree and DBC454, a new algorithm presented here, were used to analyze the sequences. A numerical assay was developed to measure the reproducibility and robustness of these algorithms. DBC454 was the most robust, closely followed by ESPRIT-Tree. DBC454 features density-based hierarchical clustering, which complements the other methods by providing insights into the structure of the data. Availability: An executable is freely available for non-commercial users at ftp://ftp.vital-it.ch/tools/dbc454. It is designed to run under MPI on a cluster of 64-bit Linux machines running Red Hat 4.x, or on a multi-core OSX system. Contact: dbc454@vital-it.ch or nicolas.guex@isb-sib.ch Oxford University Press 2013-05-15 2013-03-28 /pmc/articles/PMC3654712/ /pubmed/23539304 http://dx.doi.org/10.1093/bioinformatics/btt149 Text en © The Author 2013. Published by Oxford University Press. http://creativecommons.org/licenses/by/3.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Pagni, Marco
Niculita-Hirzel, Hélène
Pellissier, Loïc
Dubuis, Anne
Xenarios, Ioannis
Guisan, Antoine
Sanders, Ian R.
Goudet, Jérôme
Guex, Nicolas
Density-based hierarchical clustering of pyro-sequences on a large scale—the case of fungal ITS1
title Density-based hierarchical clustering of pyro-sequences on a large scale—the case of fungal ITS1
title_full Density-based hierarchical clustering of pyro-sequences on a large scale—the case of fungal ITS1
title_fullStr Density-based hierarchical clustering of pyro-sequences on a large scale—the case of fungal ITS1
title_full_unstemmed Density-based hierarchical clustering of pyro-sequences on a large scale—the case of fungal ITS1
title_short Density-based hierarchical clustering of pyro-sequences on a large scale—the case of fungal ITS1
title_sort density-based hierarchical clustering of pyro-sequences on a large scale—the case of fungal its1
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3654712/
https://www.ncbi.nlm.nih.gov/pubmed/23539304
http://dx.doi.org/10.1093/bioinformatics/btt149
work_keys_str_mv AT pagnimarco densitybasedhierarchicalclusteringofpyrosequencesonalargescalethecaseoffungalits1
AT niculitahirzelhelene densitybasedhierarchicalclusteringofpyrosequencesonalargescalethecaseoffungalits1
AT pellissierloic densitybasedhierarchicalclusteringofpyrosequencesonalargescalethecaseoffungalits1
AT dubuisanne densitybasedhierarchicalclusteringofpyrosequencesonalargescalethecaseoffungalits1
AT xenariosioannis densitybasedhierarchicalclusteringofpyrosequencesonalargescalethecaseoffungalits1
AT guisanantoine densitybasedhierarchicalclusteringofpyrosequencesonalargescalethecaseoffungalits1
AT sandersianr densitybasedhierarchicalclusteringofpyrosequencesonalargescalethecaseoffungalits1
AT goudetjerome densitybasedhierarchicalclusteringofpyrosequencesonalargescalethecaseoffungalits1
AT guexnicolas densitybasedhierarchicalclusteringofpyrosequencesonalargescalethecaseoffungalits1