Cargando…

Clines, Clusters, and the Effect of Study Design on the Inference of Human Population Structure

Previously, we observed that without using prior information about individual sampling locations, a clustering algorithm applied to multilocus genotypes from worldwide human populations produced genetic clusters largely coincident with major geographic regions. It has been argued, however, that the...

Descripción completa

Detalles Bibliográficos
Autores principales: Rosenberg, Noah A, Mahajan, Saurabh, Ramachandran, Sohini, Zhao, Chengfeng, Pritchard, Jonathan K, Feldman, Marcus W
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2005
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1310579/
https://www.ncbi.nlm.nih.gov/pubmed/16355252
http://dx.doi.org/10.1371/journal.pgen.0010070
_version_ 1782126309960318976
author Rosenberg, Noah A
Mahajan, Saurabh
Ramachandran, Sohini
Zhao, Chengfeng
Pritchard, Jonathan K
Feldman, Marcus W
author_facet Rosenberg, Noah A
Mahajan, Saurabh
Ramachandran, Sohini
Zhao, Chengfeng
Pritchard, Jonathan K
Feldman, Marcus W
author_sort Rosenberg, Noah A
collection PubMed
description Previously, we observed that without using prior information about individual sampling locations, a clustering algorithm applied to multilocus genotypes from worldwide human populations produced genetic clusters largely coincident with major geographic regions. It has been argued, however, that the degree of clustering is diminished by use of samples with greater uniformity in geographic distribution, and that the clusters we identified were a consequence of uneven sampling along genetic clines. Expanding our earlier dataset from 377 to 993 markers, we systematically examine the influence of several study design variables—sample size, number of loci, number of clusters, assumptions about correlations in allele frequencies across populations, and the geographic dispersion of the sample—on the “clusteredness” of individuals. With all other variables held constant, geographic dispersion is seen to have comparatively little effect on the degree of clustering. Examination of the relationship between genetic and geographic distance supports a view in which the clusters arise not as an artifact of the sampling scheme, but from small discontinuous jumps in genetic distance for most population pairs on opposite sides of geographic barriers, in comparison with genetic distance for pairs on the same side. Thus, analysis of the 993-locus dataset corroborates our earlier results: if enough markers are used with a sufficiently large worldwide sample, individuals can be partitioned into genetic clusters that match major geographic subdivisions of the globe, with some individuals from intermediate geographic locations having mixed membership in the clusters that correspond to neighboring regions.
format Text
id pubmed-1310579
institution National Center for Biotechnology Information
language English
publishDate 2005
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-13105792005-12-13 Clines, Clusters, and the Effect of Study Design on the Inference of Human Population Structure Rosenberg, Noah A Mahajan, Saurabh Ramachandran, Sohini Zhao, Chengfeng Pritchard, Jonathan K Feldman, Marcus W PLoS Genet Research Article Previously, we observed that without using prior information about individual sampling locations, a clustering algorithm applied to multilocus genotypes from worldwide human populations produced genetic clusters largely coincident with major geographic regions. It has been argued, however, that the degree of clustering is diminished by use of samples with greater uniformity in geographic distribution, and that the clusters we identified were a consequence of uneven sampling along genetic clines. Expanding our earlier dataset from 377 to 993 markers, we systematically examine the influence of several study design variables—sample size, number of loci, number of clusters, assumptions about correlations in allele frequencies across populations, and the geographic dispersion of the sample—on the “clusteredness” of individuals. With all other variables held constant, geographic dispersion is seen to have comparatively little effect on the degree of clustering. Examination of the relationship between genetic and geographic distance supports a view in which the clusters arise not as an artifact of the sampling scheme, but from small discontinuous jumps in genetic distance for most population pairs on opposite sides of geographic barriers, in comparison with genetic distance for pairs on the same side. Thus, analysis of the 993-locus dataset corroborates our earlier results: if enough markers are used with a sufficiently large worldwide sample, individuals can be partitioned into genetic clusters that match major geographic subdivisions of the globe, with some individuals from intermediate geographic locations having mixed membership in the clusters that correspond to neighboring regions. Public Library of Science 2005-12 2005-12-09 /pmc/articles/PMC1310579/ /pubmed/16355252 http://dx.doi.org/10.1371/journal.pgen.0010070 Text en Copyright: © 2005 Rosenberg et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Rosenberg, Noah A
Mahajan, Saurabh
Ramachandran, Sohini
Zhao, Chengfeng
Pritchard, Jonathan K
Feldman, Marcus W
Clines, Clusters, and the Effect of Study Design on the Inference of Human Population Structure
title Clines, Clusters, and the Effect of Study Design on the Inference of Human Population Structure
title_full Clines, Clusters, and the Effect of Study Design on the Inference of Human Population Structure
title_fullStr Clines, Clusters, and the Effect of Study Design on the Inference of Human Population Structure
title_full_unstemmed Clines, Clusters, and the Effect of Study Design on the Inference of Human Population Structure
title_short Clines, Clusters, and the Effect of Study Design on the Inference of Human Population Structure
title_sort clines, clusters, and the effect of study design on the inference of human population structure
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1310579/
https://www.ncbi.nlm.nih.gov/pubmed/16355252
http://dx.doi.org/10.1371/journal.pgen.0010070
work_keys_str_mv AT rosenbergnoaha clinesclustersandtheeffectofstudydesignontheinferenceofhumanpopulationstructure
AT mahajansaurabh clinesclustersandtheeffectofstudydesignontheinferenceofhumanpopulationstructure
AT ramachandransohini clinesclustersandtheeffectofstudydesignontheinferenceofhumanpopulationstructure
AT zhaochengfeng clinesclustersandtheeffectofstudydesignontheinferenceofhumanpopulationstructure
AT pritchardjonathank clinesclustersandtheeffectofstudydesignontheinferenceofhumanpopulationstructure
AT feldmanmarcusw clinesclustersandtheeffectofstudydesignontheinferenceofhumanpopulationstructure