Cargando…

An Unbiased Estimator of Gene Diversity with Improved Variance for Samples Containing Related and Inbred Individuals of any Ploidy

Gene diversity, or expected heterozygosity (H), is a common statistic for assessing genetic variation within populations. Estimation of this statistic decreases in accuracy and precision when individuals are related or inbred, due to increased dependence among allele copies in the sample. The origin...

Descripción completa

Detalles Bibliográficos
Autores principales: Harris, Alexandre M., DeGiorgio, Michael
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Genetics Society of America 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5295611/
https://www.ncbi.nlm.nih.gov/pubmed/28040781
http://dx.doi.org/10.1534/g3.116.037168
_version_ 1782505471777701888
author Harris, Alexandre M.
DeGiorgio, Michael
author_facet Harris, Alexandre M.
DeGiorgio, Michael
author_sort Harris, Alexandre M.
collection PubMed
description Gene diversity, or expected heterozygosity (H), is a common statistic for assessing genetic variation within populations. Estimation of this statistic decreases in accuracy and precision when individuals are related or inbred, due to increased dependence among allele copies in the sample. The original unbiased estimator of expected heterozygosity underestimates true population diversity in samples containing relatives, as it only accounts for sample size. More recently, a general unbiased estimator of expected heterozygosity was developed that explicitly accounts for related and inbred individuals in samples. Though unbiased, this estimator’s variance is greater than that of the original estimator. To address this issue, we introduce a general unbiased estimator of gene diversity for samples containing related or inbred individuals, which employs the best linear unbiased estimator of allele frequencies, rather than the commonly used sample proportion. We examine the properties of this estimator, [Formula: see text] relative to alternative estimators using simulations and theoretical predictions, and show that it predominantly has the smallest mean squared error relative to others. Further, we empirically assess the performance of [Formula: see text] on a global human microsatellite dataset of 5795 individuals, from 267 populations, genotyped at 645 loci. Additionally, we show that the improved variance of [Formula: see text] leads to improved estimates of the population differentiation statistic, [Formula: see text] which employs measures of gene diversity within its calculation. Finally, we provide an R script, BestHet, to compute this estimator from genomic and pedigree data.
format Online
Article
Text
id pubmed-5295611
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher Genetics Society of America
record_format MEDLINE/PubMed
spelling pubmed-52956112017-02-09 An Unbiased Estimator of Gene Diversity with Improved Variance for Samples Containing Related and Inbred Individuals of any Ploidy Harris, Alexandre M. DeGiorgio, Michael G3 (Bethesda) Investigations Gene diversity, or expected heterozygosity (H), is a common statistic for assessing genetic variation within populations. Estimation of this statistic decreases in accuracy and precision when individuals are related or inbred, due to increased dependence among allele copies in the sample. The original unbiased estimator of expected heterozygosity underestimates true population diversity in samples containing relatives, as it only accounts for sample size. More recently, a general unbiased estimator of expected heterozygosity was developed that explicitly accounts for related and inbred individuals in samples. Though unbiased, this estimator’s variance is greater than that of the original estimator. To address this issue, we introduce a general unbiased estimator of gene diversity for samples containing related or inbred individuals, which employs the best linear unbiased estimator of allele frequencies, rather than the commonly used sample proportion. We examine the properties of this estimator, [Formula: see text] relative to alternative estimators using simulations and theoretical predictions, and show that it predominantly has the smallest mean squared error relative to others. Further, we empirically assess the performance of [Formula: see text] on a global human microsatellite dataset of 5795 individuals, from 267 populations, genotyped at 645 loci. Additionally, we show that the improved variance of [Formula: see text] leads to improved estimates of the population differentiation statistic, [Formula: see text] which employs measures of gene diversity within its calculation. Finally, we provide an R script, BestHet, to compute this estimator from genomic and pedigree data. Genetics Society of America 2016-12-30 /pmc/articles/PMC5295611/ /pubmed/28040781 http://dx.doi.org/10.1534/g3.116.037168 Text en Copyright © 2017 Harris and DeGiorgio http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Investigations
Harris, Alexandre M.
DeGiorgio, Michael
An Unbiased Estimator of Gene Diversity with Improved Variance for Samples Containing Related and Inbred Individuals of any Ploidy
title An Unbiased Estimator of Gene Diversity with Improved Variance for Samples Containing Related and Inbred Individuals of any Ploidy
title_full An Unbiased Estimator of Gene Diversity with Improved Variance for Samples Containing Related and Inbred Individuals of any Ploidy
title_fullStr An Unbiased Estimator of Gene Diversity with Improved Variance for Samples Containing Related and Inbred Individuals of any Ploidy
title_full_unstemmed An Unbiased Estimator of Gene Diversity with Improved Variance for Samples Containing Related and Inbred Individuals of any Ploidy
title_short An Unbiased Estimator of Gene Diversity with Improved Variance for Samples Containing Related and Inbred Individuals of any Ploidy
title_sort unbiased estimator of gene diversity with improved variance for samples containing related and inbred individuals of any ploidy
topic Investigations
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5295611/
https://www.ncbi.nlm.nih.gov/pubmed/28040781
http://dx.doi.org/10.1534/g3.116.037168
work_keys_str_mv AT harrisalexandrem anunbiasedestimatorofgenediversitywithimprovedvarianceforsamplescontainingrelatedandinbredindividualsofanyploidy
AT degiorgiomichael anunbiasedestimatorofgenediversitywithimprovedvarianceforsamplescontainingrelatedandinbredindividualsofanyploidy
AT harrisalexandrem unbiasedestimatorofgenediversitywithimprovedvarianceforsamplescontainingrelatedandinbredindividualsofanyploidy
AT degiorgiomichael unbiasedestimatorofgenediversitywithimprovedvarianceforsamplescontainingrelatedandinbredindividualsofanyploidy