Cargando…

Visualization of SNPs with t-SNE

BACKGROUND: Single Nucleotide Polymorphisms (SNPs) are one of the largest sources of new data in biology. In most papers, SNPs between individuals are visualized with Principal Component Analysis (PCA), an older method for this purpose. PRINCIPAL FINDINGS: We compare PCA, an aging method for this pu...

Descripción completa

Detalles Bibliográficos
Autor principal: Platzer, Alexander
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3574019/
https://www.ncbi.nlm.nih.gov/pubmed/23457633
http://dx.doi.org/10.1371/journal.pone.0056883
_version_ 1782259549257859072
author Platzer, Alexander
author_facet Platzer, Alexander
author_sort Platzer, Alexander
collection PubMed
description BACKGROUND: Single Nucleotide Polymorphisms (SNPs) are one of the largest sources of new data in biology. In most papers, SNPs between individuals are visualized with Principal Component Analysis (PCA), an older method for this purpose. PRINCIPAL FINDINGS: We compare PCA, an aging method for this purpose, with a newer method, t-Distributed Stochastic Neighbor Embedding (t-SNE) for the visualization of large SNP datasets. We also propose a set of key figures for evaluating these visualizations; in all of these t-SNE performs better. SIGNIFICANCE: To transform data PCA remains a reasonably good method, but for visualization it should be replaced by a method from the subfield of dimension reduction. To evaluate the performance of visualization, we propose key figures of cross-validation with machine learning methods, as well as indices of cluster validity.
format Online
Article
Text
id pubmed-3574019
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-35740192013-03-01 Visualization of SNPs with t-SNE Platzer, Alexander PLoS One Research Article BACKGROUND: Single Nucleotide Polymorphisms (SNPs) are one of the largest sources of new data in biology. In most papers, SNPs between individuals are visualized with Principal Component Analysis (PCA), an older method for this purpose. PRINCIPAL FINDINGS: We compare PCA, an aging method for this purpose, with a newer method, t-Distributed Stochastic Neighbor Embedding (t-SNE) for the visualization of large SNP datasets. We also propose a set of key figures for evaluating these visualizations; in all of these t-SNE performs better. SIGNIFICANCE: To transform data PCA remains a reasonably good method, but for visualization it should be replaced by a method from the subfield of dimension reduction. To evaluate the performance of visualization, we propose key figures of cross-validation with machine learning methods, as well as indices of cluster validity. Public Library of Science 2013-02-15 /pmc/articles/PMC3574019/ /pubmed/23457633 http://dx.doi.org/10.1371/journal.pone.0056883 Text en © 2013 Alexander Platzer http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Platzer, Alexander
Visualization of SNPs with t-SNE
title Visualization of SNPs with t-SNE
title_full Visualization of SNPs with t-SNE
title_fullStr Visualization of SNPs with t-SNE
title_full_unstemmed Visualization of SNPs with t-SNE
title_short Visualization of SNPs with t-SNE
title_sort visualization of snps with t-sne
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3574019/
https://www.ncbi.nlm.nih.gov/pubmed/23457633
http://dx.doi.org/10.1371/journal.pone.0056883
work_keys_str_mv AT platzeralexander visualizationofsnpswithtsne