Cargando…

Simcluster: clustering enumeration gene expression data on the simplex space

BACKGROUND: Transcript enumeration methods such as SAGE, MPSS, and sequencing-by-synthesis EST "digital northern", are important high-throughput techniques for digital gene expression measurement. As other counting or voting processes, these measurements constitute compositional data exhib...

Descripción completa

Detalles Bibliográficos
Autores principales: Vêncio, Ricardo ZN, Varuzza, Leonardo, de B Pereira, Carlos A, Brentani, Helena, Shmulevich, Ilya
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2147035/
https://www.ncbi.nlm.nih.gov/pubmed/17625017
http://dx.doi.org/10.1186/1471-2105-8-246
_version_ 1782144348700278784
author Vêncio, Ricardo ZN
Varuzza, Leonardo
de B Pereira, Carlos A
Brentani, Helena
Shmulevich, Ilya
author_facet Vêncio, Ricardo ZN
Varuzza, Leonardo
de B Pereira, Carlos A
Brentani, Helena
Shmulevich, Ilya
author_sort Vêncio, Ricardo ZN
collection PubMed
description BACKGROUND: Transcript enumeration methods such as SAGE, MPSS, and sequencing-by-synthesis EST "digital northern", are important high-throughput techniques for digital gene expression measurement. As other counting or voting processes, these measurements constitute compositional data exhibiting properties particular to the simplex space where the summation of the components is constrained. These properties are not present on regular Euclidean spaces, on which hybridization-based microarray data is often modeled. Therefore, pattern recognition methods commonly used for microarray data analysis may be non-informative for the data generated by transcript enumeration techniques since they ignore certain fundamental properties of this space. RESULTS: Here we present a software tool, Simcluster, designed to perform clustering analysis for data on the simplex space. We present Simcluster as a stand-alone command-line C package and as a user-friendly on-line tool. Both versions are available at: http://xerad.systemsbiology.net/simcluster. CONCLUSION: Simcluster is designed in accordance with a well-established mathematical framework for compositional data analysis, which provides principled procedures for dealing with the simplex space, and is thus applicable in a number of contexts, including enumeration-based gene expression data.
format Text
id pubmed-2147035
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-21470352007-12-19 Simcluster: clustering enumeration gene expression data on the simplex space Vêncio, Ricardo ZN Varuzza, Leonardo de B Pereira, Carlos A Brentani, Helena Shmulevich, Ilya BMC Bioinformatics Software BACKGROUND: Transcript enumeration methods such as SAGE, MPSS, and sequencing-by-synthesis EST "digital northern", are important high-throughput techniques for digital gene expression measurement. As other counting or voting processes, these measurements constitute compositional data exhibiting properties particular to the simplex space where the summation of the components is constrained. These properties are not present on regular Euclidean spaces, on which hybridization-based microarray data is often modeled. Therefore, pattern recognition methods commonly used for microarray data analysis may be non-informative for the data generated by transcript enumeration techniques since they ignore certain fundamental properties of this space. RESULTS: Here we present a software tool, Simcluster, designed to perform clustering analysis for data on the simplex space. We present Simcluster as a stand-alone command-line C package and as a user-friendly on-line tool. Both versions are available at: http://xerad.systemsbiology.net/simcluster. CONCLUSION: Simcluster is designed in accordance with a well-established mathematical framework for compositional data analysis, which provides principled procedures for dealing with the simplex space, and is thus applicable in a number of contexts, including enumeration-based gene expression data. BioMed Central 2007-07-11 /pmc/articles/PMC2147035/ /pubmed/17625017 http://dx.doi.org/10.1186/1471-2105-8-246 Text en Copyright ©2007 Vêncio et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software
Vêncio, Ricardo ZN
Varuzza, Leonardo
de B Pereira, Carlos A
Brentani, Helena
Shmulevich, Ilya
Simcluster: clustering enumeration gene expression data on the simplex space
title Simcluster: clustering enumeration gene expression data on the simplex space
title_full Simcluster: clustering enumeration gene expression data on the simplex space
title_fullStr Simcluster: clustering enumeration gene expression data on the simplex space
title_full_unstemmed Simcluster: clustering enumeration gene expression data on the simplex space
title_short Simcluster: clustering enumeration gene expression data on the simplex space
title_sort simcluster: clustering enumeration gene expression data on the simplex space
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2147035/
https://www.ncbi.nlm.nih.gov/pubmed/17625017
http://dx.doi.org/10.1186/1471-2105-8-246
work_keys_str_mv AT vencioricardozn simclusterclusteringenumerationgeneexpressiondataonthesimplexspace
AT varuzzaleonardo simclusterclusteringenumerationgeneexpressiondataonthesimplexspace
AT debpereiracarlosa simclusterclusteringenumerationgeneexpressiondataonthesimplexspace
AT brentanihelena simclusterclusteringenumerationgeneexpressiondataonthesimplexspace
AT shmulevichilya simclusterclusteringenumerationgeneexpressiondataonthesimplexspace