Cargando…

Genotypes of informative loci from 1000 Genomes data allude evolution and mixing of human populations

Principal Component Analysis (PCA) projects high-dimensional genotype data into a few components that discern populations. Ancestry Informative Markers (AIMs) are a small subset of SNPs capable of distinguishing populations. We integrate these two approaches by proposing an algorithm to identify nec...

Descripción completa

Detalles Bibliográficos
Autores principales: Padakanti, Sridevi, Tiong, Khong-Loon, Chen, Yan-Bin, Yeang, Chen-Hsiang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8423758/
https://www.ncbi.nlm.nih.gov/pubmed/34493766
http://dx.doi.org/10.1038/s41598-021-97129-2
_version_ 1783749534363418624
author Padakanti, Sridevi
Tiong, Khong-Loon
Chen, Yan-Bin
Yeang, Chen-Hsiang
author_facet Padakanti, Sridevi
Tiong, Khong-Loon
Chen, Yan-Bin
Yeang, Chen-Hsiang
author_sort Padakanti, Sridevi
collection PubMed
description Principal Component Analysis (PCA) projects high-dimensional genotype data into a few components that discern populations. Ancestry Informative Markers (AIMs) are a small subset of SNPs capable of distinguishing populations. We integrate these two approaches by proposing an algorithm to identify necessary informative loci whose removal from the data deteriorates the PCA structure. Unlike classical AIMs, necessary informative loci densely cover the genome, hence can illuminate the evolution and mixing history of populations. We conduct a comprehensive analysis to the genotype data of the 1000 Genomes Project using necessary informative loci. Projections along the top seven principal components demarcate populations at distinct geographic levels. Millions of necessary informative loci along each PC are identified. Population identities along each PC are approximately determined by weighted sums of minor (or major) alleles over the informative loci. Variations of allele frequencies are aligned with the history and direction of population evolution. The population distribution of projections along the top three PCs is recapitulated by a simple demographic model based on several waves of founder population separation and mixing. Informative loci possess locational concentration in the genome and functional enrichment. Genes at two hot spots encompassing dense PC 7 informative loci exhibit differential expressions among European populations. The mosaic of local ancestry in the genome of a mixed descendant from multiple populations can be inferred from partial PCA projections of informative loci. Finally, informative loci derived from the 1000 Genomes data well predict the projections of an independent genotype data of South Asians. These results demonstrate the utility and relevance of informative loci to investigate human evolution.
format Online
Article
Text
id pubmed-8423758
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-84237582021-09-09 Genotypes of informative loci from 1000 Genomes data allude evolution and mixing of human populations Padakanti, Sridevi Tiong, Khong-Loon Chen, Yan-Bin Yeang, Chen-Hsiang Sci Rep Article Principal Component Analysis (PCA) projects high-dimensional genotype data into a few components that discern populations. Ancestry Informative Markers (AIMs) are a small subset of SNPs capable of distinguishing populations. We integrate these two approaches by proposing an algorithm to identify necessary informative loci whose removal from the data deteriorates the PCA structure. Unlike classical AIMs, necessary informative loci densely cover the genome, hence can illuminate the evolution and mixing history of populations. We conduct a comprehensive analysis to the genotype data of the 1000 Genomes Project using necessary informative loci. Projections along the top seven principal components demarcate populations at distinct geographic levels. Millions of necessary informative loci along each PC are identified. Population identities along each PC are approximately determined by weighted sums of minor (or major) alleles over the informative loci. Variations of allele frequencies are aligned with the history and direction of population evolution. The population distribution of projections along the top three PCs is recapitulated by a simple demographic model based on several waves of founder population separation and mixing. Informative loci possess locational concentration in the genome and functional enrichment. Genes at two hot spots encompassing dense PC 7 informative loci exhibit differential expressions among European populations. The mosaic of local ancestry in the genome of a mixed descendant from multiple populations can be inferred from partial PCA projections of informative loci. Finally, informative loci derived from the 1000 Genomes data well predict the projections of an independent genotype data of South Asians. These results demonstrate the utility and relevance of informative loci to investigate human evolution. Nature Publishing Group UK 2021-09-07 /pmc/articles/PMC8423758/ /pubmed/34493766 http://dx.doi.org/10.1038/s41598-021-97129-2 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Padakanti, Sridevi
Tiong, Khong-Loon
Chen, Yan-Bin
Yeang, Chen-Hsiang
Genotypes of informative loci from 1000 Genomes data allude evolution and mixing of human populations
title Genotypes of informative loci from 1000 Genomes data allude evolution and mixing of human populations
title_full Genotypes of informative loci from 1000 Genomes data allude evolution and mixing of human populations
title_fullStr Genotypes of informative loci from 1000 Genomes data allude evolution and mixing of human populations
title_full_unstemmed Genotypes of informative loci from 1000 Genomes data allude evolution and mixing of human populations
title_short Genotypes of informative loci from 1000 Genomes data allude evolution and mixing of human populations
title_sort genotypes of informative loci from 1000 genomes data allude evolution and mixing of human populations
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8423758/
https://www.ncbi.nlm.nih.gov/pubmed/34493766
http://dx.doi.org/10.1038/s41598-021-97129-2
work_keys_str_mv AT padakantisridevi genotypesofinformativelocifrom1000genomesdataalludeevolutionandmixingofhumanpopulations
AT tiongkhongloon genotypesofinformativelocifrom1000genomesdataalludeevolutionandmixingofhumanpopulations
AT chenyanbin genotypesofinformativelocifrom1000genomesdataalludeevolutionandmixingofhumanpopulations
AT yeangchenhsiang genotypesofinformativelocifrom1000genomesdataalludeevolutionandmixingofhumanpopulations