Cargando…
Visualization of SNPs with t-SNE
BACKGROUND: Single Nucleotide Polymorphisms (SNPs) are one of the largest sources of new data in biology. In most papers, SNPs between individuals are visualized with Principal Component Analysis (PCA), an older method for this purpose. PRINCIPAL FINDINGS: We compare PCA, an aging method for this pu...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2013
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3574019/ https://www.ncbi.nlm.nih.gov/pubmed/23457633 http://dx.doi.org/10.1371/journal.pone.0056883 |
_version_ | 1782259549257859072 |
---|---|
author | Platzer, Alexander |
author_facet | Platzer, Alexander |
author_sort | Platzer, Alexander |
collection | PubMed |
description | BACKGROUND: Single Nucleotide Polymorphisms (SNPs) are one of the largest sources of new data in biology. In most papers, SNPs between individuals are visualized with Principal Component Analysis (PCA), an older method for this purpose. PRINCIPAL FINDINGS: We compare PCA, an aging method for this purpose, with a newer method, t-Distributed Stochastic Neighbor Embedding (t-SNE) for the visualization of large SNP datasets. We also propose a set of key figures for evaluating these visualizations; in all of these t-SNE performs better. SIGNIFICANCE: To transform data PCA remains a reasonably good method, but for visualization it should be replaced by a method from the subfield of dimension reduction. To evaluate the performance of visualization, we propose key figures of cross-validation with machine learning methods, as well as indices of cluster validity. |
format | Online Article Text |
id | pubmed-3574019 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2013 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-35740192013-03-01 Visualization of SNPs with t-SNE Platzer, Alexander PLoS One Research Article BACKGROUND: Single Nucleotide Polymorphisms (SNPs) are one of the largest sources of new data in biology. In most papers, SNPs between individuals are visualized with Principal Component Analysis (PCA), an older method for this purpose. PRINCIPAL FINDINGS: We compare PCA, an aging method for this purpose, with a newer method, t-Distributed Stochastic Neighbor Embedding (t-SNE) for the visualization of large SNP datasets. We also propose a set of key figures for evaluating these visualizations; in all of these t-SNE performs better. SIGNIFICANCE: To transform data PCA remains a reasonably good method, but for visualization it should be replaced by a method from the subfield of dimension reduction. To evaluate the performance of visualization, we propose key figures of cross-validation with machine learning methods, as well as indices of cluster validity. Public Library of Science 2013-02-15 /pmc/articles/PMC3574019/ /pubmed/23457633 http://dx.doi.org/10.1371/journal.pone.0056883 Text en © 2013 Alexander Platzer http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Platzer, Alexander Visualization of SNPs with t-SNE |
title | Visualization of SNPs with t-SNE |
title_full | Visualization of SNPs with t-SNE |
title_fullStr | Visualization of SNPs with t-SNE |
title_full_unstemmed | Visualization of SNPs with t-SNE |
title_short | Visualization of SNPs with t-SNE |
title_sort | visualization of snps with t-sne |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3574019/ https://www.ncbi.nlm.nih.gov/pubmed/23457633 http://dx.doi.org/10.1371/journal.pone.0056883 |
work_keys_str_mv | AT platzeralexander visualizationofsnpswithtsne |