Cargando…

Visualization of SNPs with t-SNE

BACKGROUND: Single Nucleotide Polymorphisms (SNPs) are one of the largest sources of new data in biology. In most papers, SNPs between individuals are visualized with Principal Component Analysis (PCA), an older method for this purpose. PRINCIPAL FINDINGS: We compare PCA, an aging method for this pu...

Descripción completa

Detalles Bibliográficos
Autor principal:	Platzer, Alexander
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2013
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3574019/ https://www.ncbi.nlm.nih.gov/pubmed/23457633 http://dx.doi.org/10.1371/journal.pone.0056883

_version_	1782259549257859072
author	Platzer, Alexander
author_facet	Platzer, Alexander
author_sort	Platzer, Alexander
collection	PubMed
description	BACKGROUND: Single Nucleotide Polymorphisms (SNPs) are one of the largest sources of new data in biology. In most papers, SNPs between individuals are visualized with Principal Component Analysis (PCA), an older method for this purpose. PRINCIPAL FINDINGS: We compare PCA, an aging method for this purpose, with a newer method, t-Distributed Stochastic Neighbor Embedding (t-SNE) for the visualization of large SNP datasets. We also propose a set of key figures for evaluating these visualizations; in all of these t-SNE performs better. SIGNIFICANCE: To transform data PCA remains a reasonably good method, but for visualization it should be replaced by a method from the subfield of dimension reduction. To evaluate the performance of visualization, we propose key figures of cross-validation with machine learning methods, as well as indices of cluster validity.
format	Online Article Text
id	pubmed-3574019
institution	National Center for Biotechnology Information
language	English
publishDate	2013
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-35740192013-03-01 Visualization of SNPs with t-SNE Platzer, Alexander PLoS One Research Article BACKGROUND: Single Nucleotide Polymorphisms (SNPs) are one of the largest sources of new data in biology. In most papers, SNPs between individuals are visualized with Principal Component Analysis (PCA), an older method for this purpose. PRINCIPAL FINDINGS: We compare PCA, an aging method for this purpose, with a newer method, t-Distributed Stochastic Neighbor Embedding (t-SNE) for the visualization of large SNP datasets. We also propose a set of key figures for evaluating these visualizations; in all of these t-SNE performs better. SIGNIFICANCE: To transform data PCA remains a reasonably good method, but for visualization it should be replaced by a method from the subfield of dimension reduction. To evaluate the performance of visualization, we propose key figures of cross-validation with machine learning methods, as well as indices of cluster validity. Public Library of Science 2013-02-15 /pmc/articles/PMC3574019/ /pubmed/23457633 http://dx.doi.org/10.1371/journal.pone.0056883 Text en © 2013 Alexander Platzer http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle	Research Article Platzer, Alexander Visualization of SNPs with t-SNE
title	Visualization of SNPs with t-SNE
title_full	Visualization of SNPs with t-SNE
title_fullStr	Visualization of SNPs with t-SNE
title_full_unstemmed	Visualization of SNPs with t-SNE
title_short	Visualization of SNPs with t-SNE
title_sort	visualization of snps with t-sne
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3574019/ https://www.ncbi.nlm.nih.gov/pubmed/23457633 http://dx.doi.org/10.1371/journal.pone.0056883
work_keys_str_mv	AT platzeralexander visualizationofsnpswithtsne

Visualization of SNPs with t-SNE

Ejemplares similares