Cargando…

Analyzing the similarity of samples and genes by MG-PCC algorithm, t-SNE-SS and t-SNE-SG maps

BACKGROUND: For analyzing these gene expression data sets under different samples, clustering and visualizing samples and genes are important methods. However, it is difficult to integrate clustering and visualizing techniques when the similarities of samples and genes are defined by PCC(Person corr...

Descripción completa

Detalles Bibliográficos
Autores principales:	Jia, Xingang, Han, Qiuhong, Lu, Zuhong
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2018
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6296107/ https://www.ncbi.nlm.nih.gov/pubmed/30558536 http://dx.doi.org/10.1186/s12859-018-2495-5

_version_	1783380982388228096
author	Jia, Xingang Han, Qiuhong Lu, Zuhong
author_facet	Jia, Xingang Han, Qiuhong Lu, Zuhong
author_sort	Jia, Xingang
collection	PubMed
description	BACKGROUND: For analyzing these gene expression data sets under different samples, clustering and visualizing samples and genes are important methods. However, it is difficult to integrate clustering and visualizing techniques when the similarities of samples and genes are defined by PCC(Person correlation coefficient) measure. RESULTS: Here, for rare samples of gene expression data sets, we use MG-PCC (mini-groups that are defined by PCC) algorithm to divide them into mini-groups, and use t-SNE-SSP maps to display these mini-groups, where the idea of MG-PCC algorithm is that the nearest neighbors should be in the same mini-groups, t-SNE-SSP map is selected from a series of t-SNE(t-statistic Stochastic Neighbor Embedding) maps of standardized samples, and these t-SNE maps have different perplexity parameter. Moreover, for PCC clusters of mass genes, they are displayed by t-SNE-SGI map, where t-SNE-SGI map is selected from a series of t-SNE maps of standardized genes, and these t-SNE maps have different initialization dimensions. Here, t-SNE-SSP and t-SNE-SGI maps are selected by A-value, where A-value is modeled from areas of clustering projections, and t-SNE-SSP and t-SNE-SGI maps are such t-SNE map that has the smallest A-value. CONCLUSIONS: From the analysis of cancer gene expression data sets, we demonstrate that MG-PCC algorithm is able to put tumor and normal samples into their respective mini-groups, and t-SNE-SSP(or t-SNE-SGI) maps are able to display the relationships between mini-groups(or PCC clusters) clearly. Furthermore, t-SNE-SS(m)(or t-SNE-SG(n)) maps are able to construct independent tree diagrams of the nearest sample(or gene) neighbors, where each tree diagram is corresponding to a mini-group of samples(or genes). ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2495-5) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-6296107
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-62961072018-12-18 Analyzing the similarity of samples and genes by MG-PCC algorithm, t-SNE-SS and t-SNE-SG maps Jia, Xingang Han, Qiuhong Lu, Zuhong BMC Bioinformatics Methodology Article BACKGROUND: For analyzing these gene expression data sets under different samples, clustering and visualizing samples and genes are important methods. However, it is difficult to integrate clustering and visualizing techniques when the similarities of samples and genes are defined by PCC(Person correlation coefficient) measure. RESULTS: Here, for rare samples of gene expression data sets, we use MG-PCC (mini-groups that are defined by PCC) algorithm to divide them into mini-groups, and use t-SNE-SSP maps to display these mini-groups, where the idea of MG-PCC algorithm is that the nearest neighbors should be in the same mini-groups, t-SNE-SSP map is selected from a series of t-SNE(t-statistic Stochastic Neighbor Embedding) maps of standardized samples, and these t-SNE maps have different perplexity parameter. Moreover, for PCC clusters of mass genes, they are displayed by t-SNE-SGI map, where t-SNE-SGI map is selected from a series of t-SNE maps of standardized genes, and these t-SNE maps have different initialization dimensions. Here, t-SNE-SSP and t-SNE-SGI maps are selected by A-value, where A-value is modeled from areas of clustering projections, and t-SNE-SSP and t-SNE-SGI maps are such t-SNE map that has the smallest A-value. CONCLUSIONS: From the analysis of cancer gene expression data sets, we demonstrate that MG-PCC algorithm is able to put tumor and normal samples into their respective mini-groups, and t-SNE-SSP(or t-SNE-SGI) maps are able to display the relationships between mini-groups(or PCC clusters) clearly. Furthermore, t-SNE-SS(m)(or t-SNE-SG(n)) maps are able to construct independent tree diagrams of the nearest sample(or gene) neighbors, where each tree diagram is corresponding to a mini-group of samples(or genes). ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2495-5) contains supplementary material, which is available to authorized users. BioMed Central 2018-12-17 /pmc/articles/PMC6296107/ /pubmed/30558536 http://dx.doi.org/10.1186/s12859-018-2495-5 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Methodology Article Jia, Xingang Han, Qiuhong Lu, Zuhong Analyzing the similarity of samples and genes by MG-PCC algorithm, t-SNE-SS and t-SNE-SG maps
title	Analyzing the similarity of samples and genes by MG-PCC algorithm, t-SNE-SS and t-SNE-SG maps
title_full	Analyzing the similarity of samples and genes by MG-PCC algorithm, t-SNE-SS and t-SNE-SG maps
title_fullStr	Analyzing the similarity of samples and genes by MG-PCC algorithm, t-SNE-SS and t-SNE-SG maps
title_full_unstemmed	Analyzing the similarity of samples and genes by MG-PCC algorithm, t-SNE-SS and t-SNE-SG maps
title_short	Analyzing the similarity of samples and genes by MG-PCC algorithm, t-SNE-SS and t-SNE-SG maps
title_sort	analyzing the similarity of samples and genes by mg-pcc algorithm, t-sne-ss and t-sne-sg maps
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6296107/ https://www.ncbi.nlm.nih.gov/pubmed/30558536 http://dx.doi.org/10.1186/s12859-018-2495-5
work_keys_str_mv	AT jiaxingang analyzingthesimilarityofsamplesandgenesbymgpccalgorithmtsnessandtsnesgmaps AT hanqiuhong analyzingthesimilarityofsamplesandgenesbymgpccalgorithmtsnessandtsnesgmaps AT luzuhong analyzingthesimilarityofsamplesandgenesbymgpccalgorithmtsnessandtsnesgmaps

Analyzing the similarity of samples and genes by MG-PCC algorithm, t-SNE-SS and t-SNE-SG maps

Ejemplares similares