Cargando…

The effects of sample size on population genomic analyses – implications for the tests of neutrality

BACKGROUND: One of the fundamental measures of molecular genetic variation is the Watterson’s estimator (θ), which is based on the number of segregating sites. The estimation of θ is unbiased only under neutrality and constant population growth. It is well known that the estimation of θ is biased wh...

Descripción completa

Detalles Bibliográficos
Autor principal:	Subramanian, Sankar
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2016
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4761153/ https://www.ncbi.nlm.nih.gov/pubmed/26897757 http://dx.doi.org/10.1186/s12864-016-2441-8

_version_	1782416938322886656
author	Subramanian, Sankar
author_facet	Subramanian, Sankar
author_sort	Subramanian, Sankar
collection	PubMed
description	BACKGROUND: One of the fundamental measures of molecular genetic variation is the Watterson’s estimator (θ), which is based on the number of segregating sites. The estimation of θ is unbiased only under neutrality and constant population growth. It is well known that the estimation of θ is biased when these assumptions are violated. However, the effects of sample size in modulating the bias was not well appreciated. RESULTS: We examined this issue in detail based on large-scale exome data and robust simulations. Our investigation revealed that sample size appreciably influences θ estimation and this effect was much higher for constrained genomic regions than that of neutral regions. For instance, θ estimated for synonymous sites using 512 human exomes was 1.9 times higher than that obtained using 16 exomes. However, this difference was 2.5 times for the nonsynonymous sites of the same data. We observed a positive correlation between the rate of increase in θ estimates (with respect to the sample size) and the magnitude of selection pressure. For example, θ estimated for the nonsynonymous sites of highly constrained genes (dN/dS < 0.1) using 512 exomes was 3.6 times higher than that estimated using 16 exomes. In contrast this difference was only 2 times for the less constrained genes (dN/dS > 0.9). CONCLUSIONS: The results of this study reveal the extent of underestimation owing to small sample sizes and thus emphasize the importance of sample size in estimating a number of population genomic parameters. Our results have serious implications for neutrality tests such as Tajima D, Fu-Li D and those based on the McDonald and Kreitman test: Neutrality Index and the fraction of adaptive substitutions. For instance, use of 16 exomes produced 2.4 times higher proportion of adaptive substitutions compared to that obtained using 512 exomes (24 % vs 10 %). ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-016-2441-8) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-4761153
institution	National Center for Biotechnology Information
language	English
publishDate	2016
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-47611532016-02-21 The effects of sample size on population genomic analyses – implications for the tests of neutrality Subramanian, Sankar BMC Genomics Research Article BACKGROUND: One of the fundamental measures of molecular genetic variation is the Watterson’s estimator (θ), which is based on the number of segregating sites. The estimation of θ is unbiased only under neutrality and constant population growth. It is well known that the estimation of θ is biased when these assumptions are violated. However, the effects of sample size in modulating the bias was not well appreciated. RESULTS: We examined this issue in detail based on large-scale exome data and robust simulations. Our investigation revealed that sample size appreciably influences θ estimation and this effect was much higher for constrained genomic regions than that of neutral regions. For instance, θ estimated for synonymous sites using 512 human exomes was 1.9 times higher than that obtained using 16 exomes. However, this difference was 2.5 times for the nonsynonymous sites of the same data. We observed a positive correlation between the rate of increase in θ estimates (with respect to the sample size) and the magnitude of selection pressure. For example, θ estimated for the nonsynonymous sites of highly constrained genes (dN/dS < 0.1) using 512 exomes was 3.6 times higher than that estimated using 16 exomes. In contrast this difference was only 2 times for the less constrained genes (dN/dS > 0.9). CONCLUSIONS: The results of this study reveal the extent of underestimation owing to small sample sizes and thus emphasize the importance of sample size in estimating a number of population genomic parameters. Our results have serious implications for neutrality tests such as Tajima D, Fu-Li D and those based on the McDonald and Kreitman test: Neutrality Index and the fraction of adaptive substitutions. For instance, use of 16 exomes produced 2.4 times higher proportion of adaptive substitutions compared to that obtained using 512 exomes (24 % vs 10 %). ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-016-2441-8) contains supplementary material, which is available to authorized users. BioMed Central 2016-02-20 /pmc/articles/PMC4761153/ /pubmed/26897757 http://dx.doi.org/10.1186/s12864-016-2441-8 Text en © Subramanian. 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Article Subramanian, Sankar The effects of sample size on population genomic analyses – implications for the tests of neutrality
title	The effects of sample size on population genomic analyses – implications for the tests of neutrality
title_full	The effects of sample size on population genomic analyses – implications for the tests of neutrality
title_fullStr	The effects of sample size on population genomic analyses – implications for the tests of neutrality
title_full_unstemmed	The effects of sample size on population genomic analyses – implications for the tests of neutrality
title_short	The effects of sample size on population genomic analyses – implications for the tests of neutrality
title_sort	effects of sample size on population genomic analyses – implications for the tests of neutrality
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4761153/ https://www.ncbi.nlm.nih.gov/pubmed/26897757 http://dx.doi.org/10.1186/s12864-016-2441-8
work_keys_str_mv	AT subramaniansankar theeffectsofsamplesizeonpopulationgenomicanalysesimplicationsforthetestsofneutrality AT subramaniansankar effectsofsamplesizeonpopulationgenomicanalysesimplicationsforthetestsofneutrality

The effects of sample size on population genomic analyses – implications for the tests of neutrality

Ejemplares similares