Cargando…
The effects of sample size on population genomic analyses – implications for the tests of neutrality
BACKGROUND: One of the fundamental measures of molecular genetic variation is the Watterson’s estimator (θ), which is based on the number of segregating sites. The estimation of θ is unbiased only under neutrality and constant population growth. It is well known that the estimation of θ is biased wh...
Autor principal: | |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4761153/ https://www.ncbi.nlm.nih.gov/pubmed/26897757 http://dx.doi.org/10.1186/s12864-016-2441-8 |
_version_ | 1782416938322886656 |
---|---|
author | Subramanian, Sankar |
author_facet | Subramanian, Sankar |
author_sort | Subramanian, Sankar |
collection | PubMed |
description | BACKGROUND: One of the fundamental measures of molecular genetic variation is the Watterson’s estimator (θ), which is based on the number of segregating sites. The estimation of θ is unbiased only under neutrality and constant population growth. It is well known that the estimation of θ is biased when these assumptions are violated. However, the effects of sample size in modulating the bias was not well appreciated. RESULTS: We examined this issue in detail based on large-scale exome data and robust simulations. Our investigation revealed that sample size appreciably influences θ estimation and this effect was much higher for constrained genomic regions than that of neutral regions. For instance, θ estimated for synonymous sites using 512 human exomes was 1.9 times higher than that obtained using 16 exomes. However, this difference was 2.5 times for the nonsynonymous sites of the same data. We observed a positive correlation between the rate of increase in θ estimates (with respect to the sample size) and the magnitude of selection pressure. For example, θ estimated for the nonsynonymous sites of highly constrained genes (dN/dS < 0.1) using 512 exomes was 3.6 times higher than that estimated using 16 exomes. In contrast this difference was only 2 times for the less constrained genes (dN/dS > 0.9). CONCLUSIONS: The results of this study reveal the extent of underestimation owing to small sample sizes and thus emphasize the importance of sample size in estimating a number of population genomic parameters. Our results have serious implications for neutrality tests such as Tajima D, Fu-Li D and those based on the McDonald and Kreitman test: Neutrality Index and the fraction of adaptive substitutions. For instance, use of 16 exomes produced 2.4 times higher proportion of adaptive substitutions compared to that obtained using 512 exomes (24 % vs 10 %). ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-016-2441-8) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-4761153 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-47611532016-02-21 The effects of sample size on population genomic analyses – implications for the tests of neutrality Subramanian, Sankar BMC Genomics Research Article BACKGROUND: One of the fundamental measures of molecular genetic variation is the Watterson’s estimator (θ), which is based on the number of segregating sites. The estimation of θ is unbiased only under neutrality and constant population growth. It is well known that the estimation of θ is biased when these assumptions are violated. However, the effects of sample size in modulating the bias was not well appreciated. RESULTS: We examined this issue in detail based on large-scale exome data and robust simulations. Our investigation revealed that sample size appreciably influences θ estimation and this effect was much higher for constrained genomic regions than that of neutral regions. For instance, θ estimated for synonymous sites using 512 human exomes was 1.9 times higher than that obtained using 16 exomes. However, this difference was 2.5 times for the nonsynonymous sites of the same data. We observed a positive correlation between the rate of increase in θ estimates (with respect to the sample size) and the magnitude of selection pressure. For example, θ estimated for the nonsynonymous sites of highly constrained genes (dN/dS < 0.1) using 512 exomes was 3.6 times higher than that estimated using 16 exomes. In contrast this difference was only 2 times for the less constrained genes (dN/dS > 0.9). CONCLUSIONS: The results of this study reveal the extent of underestimation owing to small sample sizes and thus emphasize the importance of sample size in estimating a number of population genomic parameters. Our results have serious implications for neutrality tests such as Tajima D, Fu-Li D and those based on the McDonald and Kreitman test: Neutrality Index and the fraction of adaptive substitutions. For instance, use of 16 exomes produced 2.4 times higher proportion of adaptive substitutions compared to that obtained using 512 exomes (24 % vs 10 %). ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-016-2441-8) contains supplementary material, which is available to authorized users. BioMed Central 2016-02-20 /pmc/articles/PMC4761153/ /pubmed/26897757 http://dx.doi.org/10.1186/s12864-016-2441-8 Text en © Subramanian. 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Subramanian, Sankar The effects of sample size on population genomic analyses – implications for the tests of neutrality |
title | The effects of sample size on population genomic analyses – implications for the tests of neutrality |
title_full | The effects of sample size on population genomic analyses – implications for the tests of neutrality |
title_fullStr | The effects of sample size on population genomic analyses – implications for the tests of neutrality |
title_full_unstemmed | The effects of sample size on population genomic analyses – implications for the tests of neutrality |
title_short | The effects of sample size on population genomic analyses – implications for the tests of neutrality |
title_sort | effects of sample size on population genomic analyses – implications for the tests of neutrality |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4761153/ https://www.ncbi.nlm.nih.gov/pubmed/26897757 http://dx.doi.org/10.1186/s12864-016-2441-8 |
work_keys_str_mv | AT subramaniansankar theeffectsofsamplesizeonpopulationgenomicanalysesimplicationsforthetestsofneutrality AT subramaniansankar effectsofsamplesizeonpopulationgenomicanalysesimplicationsforthetestsofneutrality |