Cargando…

The effects of sample size on population genomic analyses – implications for the tests of neutrality

BACKGROUND: One of the fundamental measures of molecular genetic variation is the Watterson’s estimator (θ), which is based on the number of segregating sites. The estimation of θ is unbiased only under neutrality and constant population growth. It is well known that the estimation of θ is biased wh...

Descripción completa

Detalles Bibliográficos
Autor principal: Subramanian, Sankar
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4761153/
https://www.ncbi.nlm.nih.gov/pubmed/26897757
http://dx.doi.org/10.1186/s12864-016-2441-8
_version_ 1782416938322886656
author Subramanian, Sankar
author_facet Subramanian, Sankar
author_sort Subramanian, Sankar
collection PubMed
description BACKGROUND: One of the fundamental measures of molecular genetic variation is the Watterson’s estimator (θ), which is based on the number of segregating sites. The estimation of θ is unbiased only under neutrality and constant population growth. It is well known that the estimation of θ is biased when these assumptions are violated. However, the effects of sample size in modulating the bias was not well appreciated. RESULTS: We examined this issue in detail based on large-scale exome data and robust simulations. Our investigation revealed that sample size appreciably influences θ estimation and this effect was much higher for constrained genomic regions than that of neutral regions. For instance, θ estimated for synonymous sites using 512 human exomes was 1.9 times higher than that obtained using 16 exomes. However, this difference was 2.5 times for the nonsynonymous sites of the same data. We observed a positive correlation between the rate of increase in θ estimates (with respect to the sample size) and the magnitude of selection pressure. For example, θ estimated for the nonsynonymous sites of highly constrained genes (dN/dS < 0.1) using 512 exomes was 3.6 times higher than that estimated using 16 exomes. In contrast this difference was only 2 times for the less constrained genes (dN/dS > 0.9). CONCLUSIONS: The results of this study reveal the extent of underestimation owing to small sample sizes and thus emphasize the importance of sample size in estimating a number of population genomic parameters. Our results have serious implications for neutrality tests such as Tajima D, Fu-Li D and those based on the McDonald and Kreitman test: Neutrality Index and the fraction of adaptive substitutions. For instance, use of 16 exomes produced 2.4 times higher proportion of adaptive substitutions compared to that obtained using 512 exomes (24 % vs 10 %). ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-016-2441-8) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4761153
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-47611532016-02-21 The effects of sample size on population genomic analyses – implications for the tests of neutrality Subramanian, Sankar BMC Genomics Research Article BACKGROUND: One of the fundamental measures of molecular genetic variation is the Watterson’s estimator (θ), which is based on the number of segregating sites. The estimation of θ is unbiased only under neutrality and constant population growth. It is well known that the estimation of θ is biased when these assumptions are violated. However, the effects of sample size in modulating the bias was not well appreciated. RESULTS: We examined this issue in detail based on large-scale exome data and robust simulations. Our investigation revealed that sample size appreciably influences θ estimation and this effect was much higher for constrained genomic regions than that of neutral regions. For instance, θ estimated for synonymous sites using 512 human exomes was 1.9 times higher than that obtained using 16 exomes. However, this difference was 2.5 times for the nonsynonymous sites of the same data. We observed a positive correlation between the rate of increase in θ estimates (with respect to the sample size) and the magnitude of selection pressure. For example, θ estimated for the nonsynonymous sites of highly constrained genes (dN/dS < 0.1) using 512 exomes was 3.6 times higher than that estimated using 16 exomes. In contrast this difference was only 2 times for the less constrained genes (dN/dS > 0.9). CONCLUSIONS: The results of this study reveal the extent of underestimation owing to small sample sizes and thus emphasize the importance of sample size in estimating a number of population genomic parameters. Our results have serious implications for neutrality tests such as Tajima D, Fu-Li D and those based on the McDonald and Kreitman test: Neutrality Index and the fraction of adaptive substitutions. For instance, use of 16 exomes produced 2.4 times higher proportion of adaptive substitutions compared to that obtained using 512 exomes (24 % vs 10 %). ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-016-2441-8) contains supplementary material, which is available to authorized users. BioMed Central 2016-02-20 /pmc/articles/PMC4761153/ /pubmed/26897757 http://dx.doi.org/10.1186/s12864-016-2441-8 Text en © Subramanian. 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Subramanian, Sankar
The effects of sample size on population genomic analyses – implications for the tests of neutrality
title The effects of sample size on population genomic analyses – implications for the tests of neutrality
title_full The effects of sample size on population genomic analyses – implications for the tests of neutrality
title_fullStr The effects of sample size on population genomic analyses – implications for the tests of neutrality
title_full_unstemmed The effects of sample size on population genomic analyses – implications for the tests of neutrality
title_short The effects of sample size on population genomic analyses – implications for the tests of neutrality
title_sort effects of sample size on population genomic analyses – implications for the tests of neutrality
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4761153/
https://www.ncbi.nlm.nih.gov/pubmed/26897757
http://dx.doi.org/10.1186/s12864-016-2441-8
work_keys_str_mv AT subramaniansankar theeffectsofsamplesizeonpopulationgenomicanalysesimplicationsforthetestsofneutrality
AT subramaniansankar effectsofsamplesizeonpopulationgenomicanalysesimplicationsforthetestsofneutrality