Cargando…

Bayes-optimal estimation of overlap between populations of fixed size

Measuring the overlap between two populations is, in principle, straightforward. Upon fully sampling both populations, the number of shared objects—species, taxonomical units, or gene variants, depending on the context—can be directly counted. In practice, however, only a fraction of each population...

Descripción completa

Detalles Bibliográficos
Autor principal: Larremore, Daniel B.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6440621/
https://www.ncbi.nlm.nih.gov/pubmed/30925165
http://dx.doi.org/10.1371/journal.pcbi.1006898
_version_ 1783407422621089792
author Larremore, Daniel B.
author_facet Larremore, Daniel B.
author_sort Larremore, Daniel B.
collection PubMed
description Measuring the overlap between two populations is, in principle, straightforward. Upon fully sampling both populations, the number of shared objects—species, taxonomical units, or gene variants, depending on the context—can be directly counted. In practice, however, only a fraction of each population’s objects are likely to be sampled due to stochastic data collection or sequencing techniques. Although methods exists for quantifying population overlap under subsampled conditions, their bias is well documented and the uncertainty of their estimates cannot be quantified. Here we derive and validate a method to rigorously estimate the population overlap from incomplete samples when the total number of objects, species, or genes in each population is known, a special case of the more general β-diversity problem that is particularly relevant in the ecology and genomic epidemiology of malaria. By solving a Bayesian inference problem, this method takes into account the rates of subsampling and produces unbiased and Bayes-optimal estimates of overlap. In addition, it provides a natural framework for computing the uncertainty of its estimates, and can be used prospectively in study planning by quantifying the tradeoff between sampling effort and uncertainty.
format Online
Article
Text
id pubmed-6440621
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-64406212019-04-12 Bayes-optimal estimation of overlap between populations of fixed size Larremore, Daniel B. PLoS Comput Biol Research Article Measuring the overlap between two populations is, in principle, straightforward. Upon fully sampling both populations, the number of shared objects—species, taxonomical units, or gene variants, depending on the context—can be directly counted. In practice, however, only a fraction of each population’s objects are likely to be sampled due to stochastic data collection or sequencing techniques. Although methods exists for quantifying population overlap under subsampled conditions, their bias is well documented and the uncertainty of their estimates cannot be quantified. Here we derive and validate a method to rigorously estimate the population overlap from incomplete samples when the total number of objects, species, or genes in each population is known, a special case of the more general β-diversity problem that is particularly relevant in the ecology and genomic epidemiology of malaria. By solving a Bayesian inference problem, this method takes into account the rates of subsampling and produces unbiased and Bayes-optimal estimates of overlap. In addition, it provides a natural framework for computing the uncertainty of its estimates, and can be used prospectively in study planning by quantifying the tradeoff between sampling effort and uncertainty. Public Library of Science 2019-03-29 /pmc/articles/PMC6440621/ /pubmed/30925165 http://dx.doi.org/10.1371/journal.pcbi.1006898 Text en © 2019 Daniel B. Larremore http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Larremore, Daniel B.
Bayes-optimal estimation of overlap between populations of fixed size
title Bayes-optimal estimation of overlap between populations of fixed size
title_full Bayes-optimal estimation of overlap between populations of fixed size
title_fullStr Bayes-optimal estimation of overlap between populations of fixed size
title_full_unstemmed Bayes-optimal estimation of overlap between populations of fixed size
title_short Bayes-optimal estimation of overlap between populations of fixed size
title_sort bayes-optimal estimation of overlap between populations of fixed size
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6440621/
https://www.ncbi.nlm.nih.gov/pubmed/30925165
http://dx.doi.org/10.1371/journal.pcbi.1006898
work_keys_str_mv AT larremoredanielb bayesoptimalestimationofoverlapbetweenpopulationsoffixedsize