Cargando…

Accurate Inference of Subtle Population Structure (and Other Genetic Discontinuities) Using Principal Coordinates

BACKGROUND: Accurate inference of genetic discontinuities between populations is an essential component of intraspecific biodiversity and evolution studies, as well as associative genetics. The most widely-used methods to infer population structure are model-based, Bayesian MCMC procedures that mini...

Descripción completa

Detalles Bibliográficos
Autores principales: Reeves, Patrick A., Richards, Christopher M.
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2625398/
https://www.ncbi.nlm.nih.gov/pubmed/19172174
http://dx.doi.org/10.1371/journal.pone.0004269
_version_ 1782163438782382080
author Reeves, Patrick A.
Richards, Christopher M.
author_facet Reeves, Patrick A.
Richards, Christopher M.
author_sort Reeves, Patrick A.
collection PubMed
description BACKGROUND: Accurate inference of genetic discontinuities between populations is an essential component of intraspecific biodiversity and evolution studies, as well as associative genetics. The most widely-used methods to infer population structure are model-based, Bayesian MCMC procedures that minimize Hardy-Weinberg and linkage disequilibrium within subpopulations. These methods are useful, but suffer from large computational requirements and a dependence on modeling assumptions that may not be met in real data sets. Here we describe the development of a new approach, PCO-MC, which couples principal coordinate analysis to a clustering procedure for the inference of population structure from multilocus genotype data. METHODOLOGY/PRINCIPAL FINDINGS: PCO-MC uses data from all principal coordinate axes simultaneously to calculate a multidimensional “density landscape”, from which the number of subpopulations, and the membership within subpopulations, is determined using a valley-seeking algorithm. Using extensive simulations, we show that this approach outperforms a Bayesian MCMC procedure when many loci (e.g. 100) are sampled, but that the Bayesian procedure is marginally superior with few loci (e.g. 10). When presented with sufficient data, PCO-MC accurately delineated subpopulations with population F(st) values as low as 0.03 (G'(st)>0.2), whereas the limit of resolution of the Bayesian approach was F(st) = 0.05 (G'(st)>0.35). CONCLUSIONS/SIGNIFICANCE: We draw a distinction between population structure inference for describing biodiversity as opposed to Type I error control in associative genetics. We suggest that discrete assignments, like those produced by PCO-MC, are appropriate for circumscribing units of biodiversity whereas expression of population structure as a continuous variable is more useful for case-control correction in structured association studies.
format Text
id pubmed-2625398
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-26253982009-01-27 Accurate Inference of Subtle Population Structure (and Other Genetic Discontinuities) Using Principal Coordinates Reeves, Patrick A. Richards, Christopher M. PLoS One Research Article BACKGROUND: Accurate inference of genetic discontinuities between populations is an essential component of intraspecific biodiversity and evolution studies, as well as associative genetics. The most widely-used methods to infer population structure are model-based, Bayesian MCMC procedures that minimize Hardy-Weinberg and linkage disequilibrium within subpopulations. These methods are useful, but suffer from large computational requirements and a dependence on modeling assumptions that may not be met in real data sets. Here we describe the development of a new approach, PCO-MC, which couples principal coordinate analysis to a clustering procedure for the inference of population structure from multilocus genotype data. METHODOLOGY/PRINCIPAL FINDINGS: PCO-MC uses data from all principal coordinate axes simultaneously to calculate a multidimensional “density landscape”, from which the number of subpopulations, and the membership within subpopulations, is determined using a valley-seeking algorithm. Using extensive simulations, we show that this approach outperforms a Bayesian MCMC procedure when many loci (e.g. 100) are sampled, but that the Bayesian procedure is marginally superior with few loci (e.g. 10). When presented with sufficient data, PCO-MC accurately delineated subpopulations with population F(st) values as low as 0.03 (G'(st)>0.2), whereas the limit of resolution of the Bayesian approach was F(st) = 0.05 (G'(st)>0.35). CONCLUSIONS/SIGNIFICANCE: We draw a distinction between population structure inference for describing biodiversity as opposed to Type I error control in associative genetics. We suggest that discrete assignments, like those produced by PCO-MC, are appropriate for circumscribing units of biodiversity whereas expression of population structure as a continuous variable is more useful for case-control correction in structured association studies. Public Library of Science 2009-01-27 /pmc/articles/PMC2625398/ /pubmed/19172174 http://dx.doi.org/10.1371/journal.pone.0004269 Text en This is an open-access article distributed under the terms of the Creative Commons Public Domain declaration which stipulates that, once placed in the public domain, this work may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. https://creativecommons.org/publicdomain/zero/1.0/ This is an open-access article distributed under the terms of the Creative Commons Public Domain declaration, which stipulates that, once placed in the public domain, this work may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose.
spellingShingle Research Article
Reeves, Patrick A.
Richards, Christopher M.
Accurate Inference of Subtle Population Structure (and Other Genetic Discontinuities) Using Principal Coordinates
title Accurate Inference of Subtle Population Structure (and Other Genetic Discontinuities) Using Principal Coordinates
title_full Accurate Inference of Subtle Population Structure (and Other Genetic Discontinuities) Using Principal Coordinates
title_fullStr Accurate Inference of Subtle Population Structure (and Other Genetic Discontinuities) Using Principal Coordinates
title_full_unstemmed Accurate Inference of Subtle Population Structure (and Other Genetic Discontinuities) Using Principal Coordinates
title_short Accurate Inference of Subtle Population Structure (and Other Genetic Discontinuities) Using Principal Coordinates
title_sort accurate inference of subtle population structure (and other genetic discontinuities) using principal coordinates
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2625398/
https://www.ncbi.nlm.nih.gov/pubmed/19172174
http://dx.doi.org/10.1371/journal.pone.0004269
work_keys_str_mv AT reevespatricka accurateinferenceofsubtlepopulationstructureandothergeneticdiscontinuitiesusingprincipalcoordinates
AT richardschristopherm accurateinferenceofsubtlepopulationstructureandothergeneticdiscontinuitiesusingprincipalcoordinates