Cargando…

Trustworthiness and metrics in visualizing similarity of gene expression

BACKGROUND: Conventionally, the first step in analyzing the large and high-dimensional data sets measured by microarrays is visual exploration. Dendrograms of hierarchical clustering, self-organizing maps (SOMs), and multidimensional scaling have been used to visualize similarity relationships of da...

Descripción completa

Detalles Bibliográficos
Autores principales: Kaski, Samuel, Nikkilä, Janne, Oja, Merja, Venna, Jarkko, Törönen, Petri, Castrén, Eero
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2003
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC272927/
https://www.ncbi.nlm.nih.gov/pubmed/14552657
http://dx.doi.org/10.1186/1471-2105-4-48
_version_ 1782121050141622272
author Kaski, Samuel
Nikkilä, Janne
Oja, Merja
Venna, Jarkko
Törönen, Petri
Castrén, Eero
author_facet Kaski, Samuel
Nikkilä, Janne
Oja, Merja
Venna, Jarkko
Törönen, Petri
Castrén, Eero
author_sort Kaski, Samuel
collection PubMed
description BACKGROUND: Conventionally, the first step in analyzing the large and high-dimensional data sets measured by microarrays is visual exploration. Dendrograms of hierarchical clustering, self-organizing maps (SOMs), and multidimensional scaling have been used to visualize similarity relationships of data samples. We address two central properties of the methods: (i) Are the visualizations trustworthy, i.e., if two samples are visualized to be similar, are they really similar? (ii) The metric. The measure of similarity determines the result; we propose using a new learning metrics principle to derive a metric from interrelationships among data sets. RESULTS: The trustworthiness of hierarchical clustering, multidimensional scaling, and the self-organizing map were compared in visualizing similarity relationships among gene expression profiles. The self-organizing map was the best except that hierarchical clustering was the most trustworthy for the most similar profiles. Trustworthiness can be further increased by treating separately those genes for which the visualization is least trustworthy. We then proceed to improve the metric. The distance measure between the expression profiles is adjusted to measure differences relevant to functional classes of the genes. The genes for which the new metric is the most different from the usual correlation metric are listed and visualized with one of the visualization methods, the self-organizing map, computed in the new metric. CONCLUSIONS: The conjecture from the methodological results is that the self-organizing map can be recommended to complement the usual hierarchical clustering for visualizing and exploring gene expression data. Discarding the least trustworthy samples and improving the metric still improves it.
format Text
id pubmed-272927
institution National Center for Biotechnology Information
language English
publishDate 2003
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-2729272003-11-22 Trustworthiness and metrics in visualizing similarity of gene expression Kaski, Samuel Nikkilä, Janne Oja, Merja Venna, Jarkko Törönen, Petri Castrén, Eero BMC Bioinformatics Research Article BACKGROUND: Conventionally, the first step in analyzing the large and high-dimensional data sets measured by microarrays is visual exploration. Dendrograms of hierarchical clustering, self-organizing maps (SOMs), and multidimensional scaling have been used to visualize similarity relationships of data samples. We address two central properties of the methods: (i) Are the visualizations trustworthy, i.e., if two samples are visualized to be similar, are they really similar? (ii) The metric. The measure of similarity determines the result; we propose using a new learning metrics principle to derive a metric from interrelationships among data sets. RESULTS: The trustworthiness of hierarchical clustering, multidimensional scaling, and the self-organizing map were compared in visualizing similarity relationships among gene expression profiles. The self-organizing map was the best except that hierarchical clustering was the most trustworthy for the most similar profiles. Trustworthiness can be further increased by treating separately those genes for which the visualization is least trustworthy. We then proceed to improve the metric. The distance measure between the expression profiles is adjusted to measure differences relevant to functional classes of the genes. The genes for which the new metric is the most different from the usual correlation metric are listed and visualized with one of the visualization methods, the self-organizing map, computed in the new metric. CONCLUSIONS: The conjecture from the methodological results is that the self-organizing map can be recommended to complement the usual hierarchical clustering for visualizing and exploring gene expression data. Discarding the least trustworthy samples and improving the metric still improves it. BioMed Central 2003-10-13 /pmc/articles/PMC272927/ /pubmed/14552657 http://dx.doi.org/10.1186/1471-2105-4-48 Text en Copyright © 2003 Kaski et al; licensee BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.
spellingShingle Research Article
Kaski, Samuel
Nikkilä, Janne
Oja, Merja
Venna, Jarkko
Törönen, Petri
Castrén, Eero
Trustworthiness and metrics in visualizing similarity of gene expression
title Trustworthiness and metrics in visualizing similarity of gene expression
title_full Trustworthiness and metrics in visualizing similarity of gene expression
title_fullStr Trustworthiness and metrics in visualizing similarity of gene expression
title_full_unstemmed Trustworthiness and metrics in visualizing similarity of gene expression
title_short Trustworthiness and metrics in visualizing similarity of gene expression
title_sort trustworthiness and metrics in visualizing similarity of gene expression
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC272927/
https://www.ncbi.nlm.nih.gov/pubmed/14552657
http://dx.doi.org/10.1186/1471-2105-4-48
work_keys_str_mv AT kaskisamuel trustworthinessandmetricsinvisualizingsimilarityofgeneexpression
AT nikkilajanne trustworthinessandmetricsinvisualizingsimilarityofgeneexpression
AT ojamerja trustworthinessandmetricsinvisualizingsimilarityofgeneexpression
AT vennajarkko trustworthinessandmetricsinvisualizingsimilarityofgeneexpression
AT toronenpetri trustworthinessandmetricsinvisualizingsimilarityofgeneexpression
AT castreneero trustworthinessandmetricsinvisualizingsimilarityofgeneexpression