Cargando…
Effects of Sample Selection Bias on the Accuracy of Population Structure and Ancestry Inference
Population stratification is an important task in genetic analyses. It provides information about the ancestry of individuals and can be an important confounder in genome-wide association studies. Public genotyping projects have made a large number of datasets available for study. However, practical...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Genetics Society of America
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4025489/ https://www.ncbi.nlm.nih.gov/pubmed/24637351 http://dx.doi.org/10.1534/g3.113.007633 |
_version_ | 1782316770814590976 |
---|---|
author | Shringarpure, Suyash Xing, Eric P. |
author_facet | Shringarpure, Suyash Xing, Eric P. |
author_sort | Shringarpure, Suyash |
collection | PubMed |
description | Population stratification is an important task in genetic analyses. It provides information about the ancestry of individuals and can be an important confounder in genome-wide association studies. Public genotyping projects have made a large number of datasets available for study. However, practical constraints dictate that of a geographical/ethnic population, only a small number of individuals are genotyped. The resulting data are a sample from the entire population. If the distribution of sample sizes is not representative of the populations being sampled, the accuracy of population stratification analyses of the data could be affected. We attempt to understand the effect of biased sampling on the accuracy of population structure analysis and individual ancestry recovery. We examined two commonly used methods for analyses of such datasets, ADMIXTURE and EIGENSOFT, and found that the accuracy of recovery of population structure is affected to a large extent by the sample used for analysis and how representative it is of the underlying populations. Using simulated data and real genotype data from cattle, we show that sample selection bias can affect the results of population structure analyses. We develop a mathematical framework for sample selection bias in models for population structure and also proposed a correction for sample selection bias using auxiliary information about the sample. We demonstrate that such a correction is effective in practice using simulated and real data. |
format | Online Article Text |
id | pubmed-4025489 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | Genetics Society of America |
record_format | MEDLINE/PubMed |
spelling | pubmed-40254892014-05-30 Effects of Sample Selection Bias on the Accuracy of Population Structure and Ancestry Inference Shringarpure, Suyash Xing, Eric P. G3 (Bethesda) Investigations Population stratification is an important task in genetic analyses. It provides information about the ancestry of individuals and can be an important confounder in genome-wide association studies. Public genotyping projects have made a large number of datasets available for study. However, practical constraints dictate that of a geographical/ethnic population, only a small number of individuals are genotyped. The resulting data are a sample from the entire population. If the distribution of sample sizes is not representative of the populations being sampled, the accuracy of population stratification analyses of the data could be affected. We attempt to understand the effect of biased sampling on the accuracy of population structure analysis and individual ancestry recovery. We examined two commonly used methods for analyses of such datasets, ADMIXTURE and EIGENSOFT, and found that the accuracy of recovery of population structure is affected to a large extent by the sample used for analysis and how representative it is of the underlying populations. Using simulated data and real genotype data from cattle, we show that sample selection bias can affect the results of population structure analyses. We develop a mathematical framework for sample selection bias in models for population structure and also proposed a correction for sample selection bias using auxiliary information about the sample. We demonstrate that such a correction is effective in practice using simulated and real data. Genetics Society of America 2014-03-17 /pmc/articles/PMC4025489/ /pubmed/24637351 http://dx.doi.org/10.1534/g3.113.007633 Text en Copyright © 2014 Shringarpure and Xing http://creativecommons.org/licenses/by/3.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution Unported License (http://creativecommons.org/licenses/by/3.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Investigations Shringarpure, Suyash Xing, Eric P. Effects of Sample Selection Bias on the Accuracy of Population Structure and Ancestry Inference |
title | Effects of Sample Selection Bias on the Accuracy of Population Structure and Ancestry Inference |
title_full | Effects of Sample Selection Bias on the Accuracy of Population Structure and Ancestry Inference |
title_fullStr | Effects of Sample Selection Bias on the Accuracy of Population Structure and Ancestry Inference |
title_full_unstemmed | Effects of Sample Selection Bias on the Accuracy of Population Structure and Ancestry Inference |
title_short | Effects of Sample Selection Bias on the Accuracy of Population Structure and Ancestry Inference |
title_sort | effects of sample selection bias on the accuracy of population structure and ancestry inference |
topic | Investigations |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4025489/ https://www.ncbi.nlm.nih.gov/pubmed/24637351 http://dx.doi.org/10.1534/g3.113.007633 |
work_keys_str_mv | AT shringarpuresuyash effectsofsampleselectionbiasontheaccuracyofpopulationstructureandancestryinference AT xingericp effectsofsampleselectionbiasontheaccuracyofpopulationstructureandancestryinference |