Cargando…

Inference of Population Structure using Dense Haplotype Data

The advent of genome-wide dense variation data provides an opportunity to investigate ancestry in unprecedented detail, but presents new statistical challenges. We propose a novel inference framework that aims to efficiently capture information on population structure provided by patterns of haploty...

Descripción completa

Detalles Bibliográficos
Autores principales:	Lawson, Daniel John, Hellenthal, Garrett, Myers, Simon, Falush, Daniel
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2012
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3266881/ https://www.ncbi.nlm.nih.gov/pubmed/22291602 http://dx.doi.org/10.1371/journal.pgen.1002453

_version_	1782222217337110528
author	Lawson, Daniel John Hellenthal, Garrett Myers, Simon Falush, Daniel
author_facet	Lawson, Daniel John Hellenthal, Garrett Myers, Simon Falush, Daniel
author_sort	Lawson, Daniel John
collection	PubMed
description	The advent of genome-wide dense variation data provides an opportunity to investigate ancestry in unprecedented detail, but presents new statistical challenges. We propose a novel inference framework that aims to efficiently capture information on population structure provided by patterns of haplotype similarity. Each individual in a sample is considered in turn as a recipient, whose chromosomes are reconstructed using chunks of DNA donated by the other individuals. Results of this “chromosome painting” can be summarized as a “coancestry matrix,” which directly reveals key information about ancestral relationships among individuals. If markers are viewed as independent, we show that this matrix almost completely captures the information used by both standard Principal Components Analysis (PCA) and model-based approaches such as STRUCTURE in a unified manner. Furthermore, when markers are in linkage disequilibrium, the matrix combines information across successive markers to increase the ability to discern fine-scale population structure using PCA. In parallel, we have developed an efficient model-based approach to identify discrete populations using this matrix, which offers advantages over PCA in terms of interpretability and over existing clustering algorithms in terms of speed, number of separable populations, and sensitivity to subtle population structure. We analyse Human Genome Diversity Panel data for 938 individuals and 641,000 markers, and we identify 226 populations reflecting differences on continental, regional, local, and family scales. We present multiple lines of evidence that, while many methods capture similar information among strongly differentiated groups, more subtle population structure in human populations is consistently present at a much finer level than currently available geographic labels and is only captured by the haplotype-based approach. The software used for this article, ChromoPainter and fineSTRUCTURE, is available from http://www.paintmychromosomes.com/.
format	Online Article Text
id	pubmed-3266881
institution	National Center for Biotechnology Information
language	English
publishDate	2012
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-32668812012-01-30 Inference of Population Structure using Dense Haplotype Data Lawson, Daniel John Hellenthal, Garrett Myers, Simon Falush, Daniel PLoS Genet Research Article The advent of genome-wide dense variation data provides an opportunity to investigate ancestry in unprecedented detail, but presents new statistical challenges. We propose a novel inference framework that aims to efficiently capture information on population structure provided by patterns of haplotype similarity. Each individual in a sample is considered in turn as a recipient, whose chromosomes are reconstructed using chunks of DNA donated by the other individuals. Results of this “chromosome painting” can be summarized as a “coancestry matrix,” which directly reveals key information about ancestral relationships among individuals. If markers are viewed as independent, we show that this matrix almost completely captures the information used by both standard Principal Components Analysis (PCA) and model-based approaches such as STRUCTURE in a unified manner. Furthermore, when markers are in linkage disequilibrium, the matrix combines information across successive markers to increase the ability to discern fine-scale population structure using PCA. In parallel, we have developed an efficient model-based approach to identify discrete populations using this matrix, which offers advantages over PCA in terms of interpretability and over existing clustering algorithms in terms of speed, number of separable populations, and sensitivity to subtle population structure. We analyse Human Genome Diversity Panel data for 938 individuals and 641,000 markers, and we identify 226 populations reflecting differences on continental, regional, local, and family scales. We present multiple lines of evidence that, while many methods capture similar information among strongly differentiated groups, more subtle population structure in human populations is consistently present at a much finer level than currently available geographic labels and is only captured by the haplotype-based approach. The software used for this article, ChromoPainter and fineSTRUCTURE, is available from http://www.paintmychromosomes.com/. Public Library of Science 2012-01-26 /pmc/articles/PMC3266881/ /pubmed/22291602 http://dx.doi.org/10.1371/journal.pgen.1002453 Text en Lawson et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle	Research Article Lawson, Daniel John Hellenthal, Garrett Myers, Simon Falush, Daniel Inference of Population Structure using Dense Haplotype Data
title	Inference of Population Structure using Dense Haplotype Data
title_full	Inference of Population Structure using Dense Haplotype Data
title_fullStr	Inference of Population Structure using Dense Haplotype Data
title_full_unstemmed	Inference of Population Structure using Dense Haplotype Data
title_short	Inference of Population Structure using Dense Haplotype Data
title_sort	inference of population structure using dense haplotype data
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3266881/ https://www.ncbi.nlm.nih.gov/pubmed/22291602 http://dx.doi.org/10.1371/journal.pgen.1002453
work_keys_str_mv	AT lawsondanieljohn inferenceofpopulationstructureusingdensehaplotypedata AT hellenthalgarrett inferenceofpopulationstructureusingdensehaplotypedata AT myerssimon inferenceofpopulationstructureusingdensehaplotypedata AT falushdaniel inferenceofpopulationstructureusingdensehaplotypedata

Inference of Population Structure using Dense Haplotype Data

Ejemplares similares