Cargando…

Using Network Methodology to Infer Population Substructure

One of the main caveats of association studies is the possible affection by bias due to population stratification. Existing methods rely on model-based approaches like structure and ADMIXTURE or on principal component analysis like EIGENSTRAT. Here we provide a novel visualization technique and desc...

Descripción completa

Detalles Bibliográficos
Autores principales:	Prokopenko, Dmitry, Hecker, Julian, Silverman, Edwin, Nöthen, Markus M., Schmid, Matthias, Lange, Christoph, Loehlein Fier, Heide
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2015
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4476755/ https://www.ncbi.nlm.nih.gov/pubmed/26098940 http://dx.doi.org/10.1371/journal.pone.0130708

_version_	1782377649364008960
author	Prokopenko, Dmitry Hecker, Julian Silverman, Edwin Nöthen, Markus M. Schmid, Matthias Lange, Christoph Loehlein Fier, Heide
author_facet	Prokopenko, Dmitry Hecker, Julian Silverman, Edwin Nöthen, Markus M. Schmid, Matthias Lange, Christoph Loehlein Fier, Heide
author_sort	Prokopenko, Dmitry
collection	PubMed
description	One of the main caveats of association studies is the possible affection by bias due to population stratification. Existing methods rely on model-based approaches like structure and ADMIXTURE or on principal component analysis like EIGENSTRAT. Here we provide a novel visualization technique and describe the problem of population substructure from a graph-theoretical point of view. We group the sequenced individuals into triads, which depict the relational structure, on the basis of a predefined pairwise similarity measure. We then merge the triads into a network and apply community detection algorithms in order to identify homogeneous subgroups or communities, which can further be incorporated as covariates into logistic regression. We apply our method to populations from different continents in the 1000 Genomes Project and evaluate the type 1 error based on the empirical p-values. The application to 1000 Genomes data suggests that the network approach provides a very fine resolution of the underlying ancestral population structure. Besides we show in simulations, that in the presence of discrete population structures, our developed approach maintains the type 1 error more precisely than existing approaches.
format	Online Article Text
id	pubmed-4476755
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-44767552015-06-25 Using Network Methodology to Infer Population Substructure Prokopenko, Dmitry Hecker, Julian Silverman, Edwin Nöthen, Markus M. Schmid, Matthias Lange, Christoph Loehlein Fier, Heide PLoS One Research Article One of the main caveats of association studies is the possible affection by bias due to population stratification. Existing methods rely on model-based approaches like structure and ADMIXTURE or on principal component analysis like EIGENSTRAT. Here we provide a novel visualization technique and describe the problem of population substructure from a graph-theoretical point of view. We group the sequenced individuals into triads, which depict the relational structure, on the basis of a predefined pairwise similarity measure. We then merge the triads into a network and apply community detection algorithms in order to identify homogeneous subgroups or communities, which can further be incorporated as covariates into logistic regression. We apply our method to populations from different continents in the 1000 Genomes Project and evaluate the type 1 error based on the empirical p-values. The application to 1000 Genomes data suggests that the network approach provides a very fine resolution of the underlying ancestral population structure. Besides we show in simulations, that in the presence of discrete population structures, our developed approach maintains the type 1 error more precisely than existing approaches. Public Library of Science 2015-06-22 /pmc/articles/PMC4476755/ /pubmed/26098940 http://dx.doi.org/10.1371/journal.pone.0130708 Text en © 2015 Prokopenko et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle	Research Article Prokopenko, Dmitry Hecker, Julian Silverman, Edwin Nöthen, Markus M. Schmid, Matthias Lange, Christoph Loehlein Fier, Heide Using Network Methodology to Infer Population Substructure
title	Using Network Methodology to Infer Population Substructure
title_full	Using Network Methodology to Infer Population Substructure
title_fullStr	Using Network Methodology to Infer Population Substructure
title_full_unstemmed	Using Network Methodology to Infer Population Substructure
title_short	Using Network Methodology to Infer Population Substructure
title_sort	using network methodology to infer population substructure
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4476755/ https://www.ncbi.nlm.nih.gov/pubmed/26098940 http://dx.doi.org/10.1371/journal.pone.0130708
work_keys_str_mv	AT prokopenkodmitry usingnetworkmethodologytoinferpopulationsubstructure AT heckerjulian usingnetworkmethodologytoinferpopulationsubstructure AT silvermanedwin usingnetworkmethodologytoinferpopulationsubstructure AT nothenmarkusm usingnetworkmethodologytoinferpopulationsubstructure AT schmidmatthias usingnetworkmethodologytoinferpopulationsubstructure AT langechristoph usingnetworkmethodologytoinferpopulationsubstructure AT loehleinfierheide usingnetworkmethodologytoinferpopulationsubstructure

Using Network Methodology to Infer Population Substructure

Ejemplares similares