Cargando…

pong: fast analysis and visualization of latent clusters in population genetic data

Motivation: A series of methods in population genetics use multilocus genotype data to assign individuals membership in latent clusters. These methods belong to a broad class of mixed-membership models, such as latent Dirichlet allocation used to analyze text corpora. Inference from mixed-membership...

Descripción completa

Detalles Bibliográficos
Autores principales: Behr, Aaron A., Liu, Katherine Z., Liu-Fang, Gracie, Nakka, Priyanka, Ramachandran, Sohini
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5018373/
https://www.ncbi.nlm.nih.gov/pubmed/27283948
http://dx.doi.org/10.1093/bioinformatics/btw327
_version_ 1782452908582764544
author Behr, Aaron A.
Liu, Katherine Z.
Liu-Fang, Gracie
Nakka, Priyanka
Ramachandran, Sohini
author_facet Behr, Aaron A.
Liu, Katherine Z.
Liu-Fang, Gracie
Nakka, Priyanka
Ramachandran, Sohini
author_sort Behr, Aaron A.
collection PubMed
description Motivation: A series of methods in population genetics use multilocus genotype data to assign individuals membership in latent clusters. These methods belong to a broad class of mixed-membership models, such as latent Dirichlet allocation used to analyze text corpora. Inference from mixed-membership models can produce different output matrices when repeatedly applied to the same inputs, and the number of latent clusters is a parameter that is often varied in the analysis pipeline. For these reasons, quantifying, visualizing, and annotating the output from mixed-membership models are bottlenecks for investigators across multiple disciplines from ecology to text data mining. Results: We introduce pong, a network-graphical approach for analyzing and visualizing membership in latent clusters with a native interactive D3.js visualization. pong leverages efficient algorithms for solving the Assignment Problem to dramatically reduce runtime while increasing accuracy compared with other methods that process output from mixed-membership models. We apply pong to 225 705 unlinked genome-wide single-nucleotide variants from 2426 unrelated individuals in the 1000 Genomes Project, and identify previously overlooked aspects of global human population structure. We show that pong outpaces current solutions by more than an order of magnitude in runtime while providing a customizable and interactive visualization of population structure that is more accurate than those produced by current tools. Availability and Implementation: pong is freely available and can be installed using the Python package management system pip. pong’s source code is available at https://github.com/abehr/pong. Contact: aaron_behr@alumni.brown.edu or sramachandran@brown.edu Supplementary Information: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-5018373
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-50183732016-09-12 pong: fast analysis and visualization of latent clusters in population genetic data Behr, Aaron A. Liu, Katherine Z. Liu-Fang, Gracie Nakka, Priyanka Ramachandran, Sohini Bioinformatics Original Papers Motivation: A series of methods in population genetics use multilocus genotype data to assign individuals membership in latent clusters. These methods belong to a broad class of mixed-membership models, such as latent Dirichlet allocation used to analyze text corpora. Inference from mixed-membership models can produce different output matrices when repeatedly applied to the same inputs, and the number of latent clusters is a parameter that is often varied in the analysis pipeline. For these reasons, quantifying, visualizing, and annotating the output from mixed-membership models are bottlenecks for investigators across multiple disciplines from ecology to text data mining. Results: We introduce pong, a network-graphical approach for analyzing and visualizing membership in latent clusters with a native interactive D3.js visualization. pong leverages efficient algorithms for solving the Assignment Problem to dramatically reduce runtime while increasing accuracy compared with other methods that process output from mixed-membership models. We apply pong to 225 705 unlinked genome-wide single-nucleotide variants from 2426 unrelated individuals in the 1000 Genomes Project, and identify previously overlooked aspects of global human population structure. We show that pong outpaces current solutions by more than an order of magnitude in runtime while providing a customizable and interactive visualization of population structure that is more accurate than those produced by current tools. Availability and Implementation: pong is freely available and can be installed using the Python package management system pip. pong’s source code is available at https://github.com/abehr/pong. Contact: aaron_behr@alumni.brown.edu or sramachandran@brown.edu Supplementary Information: Supplementary data are available at Bioinformatics online. Oxford University Press 2016-09-15 2016-06-09 /pmc/articles/PMC5018373/ /pubmed/27283948 http://dx.doi.org/10.1093/bioinformatics/btw327 Text en © The Author 2016. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Original Papers
Behr, Aaron A.
Liu, Katherine Z.
Liu-Fang, Gracie
Nakka, Priyanka
Ramachandran, Sohini
pong: fast analysis and visualization of latent clusters in population genetic data
title pong: fast analysis and visualization of latent clusters in population genetic data
title_full pong: fast analysis and visualization of latent clusters in population genetic data
title_fullStr pong: fast analysis and visualization of latent clusters in population genetic data
title_full_unstemmed pong: fast analysis and visualization of latent clusters in population genetic data
title_short pong: fast analysis and visualization of latent clusters in population genetic data
title_sort pong: fast analysis and visualization of latent clusters in population genetic data
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5018373/
https://www.ncbi.nlm.nih.gov/pubmed/27283948
http://dx.doi.org/10.1093/bioinformatics/btw327
work_keys_str_mv AT behraarona pongfastanalysisandvisualizationoflatentclustersinpopulationgeneticdata
AT liukatherinez pongfastanalysisandvisualizationoflatentclustersinpopulationgeneticdata
AT liufanggracie pongfastanalysisandvisualizationoflatentclustersinpopulationgeneticdata
AT nakkapriyanka pongfastanalysisandvisualizationoflatentclustersinpopulationgeneticdata
AT ramachandransohini pongfastanalysisandvisualizationoflatentclustersinpopulationgeneticdata