Cargando…

Coalescent: an open-science framework for importance sampling in coalescent theory

Background. In coalescent theory, computer programs often use importance sampling to calculate likelihoods and other statistical quantities. An importance sampling scheme can exploit human intuition to improve statistical efficiency of computations, but unfortunately, in the absence of general compu...

Descripción completa

Detalles Bibliográficos
Autores principales: Tewari, Susanta, Spouge, John L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: PeerJ Inc. 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4548476/
https://www.ncbi.nlm.nih.gov/pubmed/26312189
http://dx.doi.org/10.7717/peerj.1203
_version_ 1782387199116836864
author Tewari, Susanta
Spouge, John L.
author_facet Tewari, Susanta
Spouge, John L.
author_sort Tewari, Susanta
collection PubMed
description Background. In coalescent theory, computer programs often use importance sampling to calculate likelihoods and other statistical quantities. An importance sampling scheme can exploit human intuition to improve statistical efficiency of computations, but unfortunately, in the absence of general computer frameworks on importance sampling, researchers often struggle to translate new sampling schemes computationally or benchmark against different schemes, in a manner that is reliable and maintainable. Moreover, most studies use computer programs lacking a convenient user interface or the flexibility to meet the current demands of open science. In particular, current computer frameworks can only evaluate the efficiency of a single importance sampling scheme or compare the efficiencies of different schemes in an ad hoc manner. Results. We have designed a general framework (http://coalescent.sourceforge.net; language: Java; License: GPLv3) for importance sampling that computes likelihoods under the standard neutral coalescent model of a single, well-mixed population of constant size over time following infinite sites model of mutation. The framework models the necessary core concepts, comes integrated with several data sets of varying size, implements the standard competing proposals, and integrates tightly with our previous framework for calculating exact probabilities. For a given dataset, it computes the likelihood and provides the maximum likelihood estimate of the mutation parameter. Well-known benchmarks in the coalescent literature validate the accuracy of the framework. The framework provides an intuitive user interface with minimal clutter. For performance, the framework switches automatically to modern multicore hardware, if available. It runs on three major platforms (Windows, Mac and Linux). Extensive tests and coverage make the framework reliable and maintainable. Conclusions. In coalescent theory, many studies of computational efficiency consider only effective sample size. Here, we evaluate proposals in the coalescent literature, to discover that the order of efficiency among the three importance sampling schemes changes when one considers running time as well as effective sample size. We also describe a computational technique called “just-in-time delegation” available to improve the trade-off between running time and precision by constructing improved importance sampling schemes from existing ones. Thus, our systems approach is a potential solution to the “2(8) programs problem” highlighted by Felsenstein, because it provides the flexibility to include or exclude various features of similar coalescent models or importance sampling schemes.
format Online
Article
Text
id pubmed-4548476
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher PeerJ Inc.
record_format MEDLINE/PubMed
spelling pubmed-45484762015-08-26 Coalescent: an open-science framework for importance sampling in coalescent theory Tewari, Susanta Spouge, John L. PeerJ Computational Biology Background. In coalescent theory, computer programs often use importance sampling to calculate likelihoods and other statistical quantities. An importance sampling scheme can exploit human intuition to improve statistical efficiency of computations, but unfortunately, in the absence of general computer frameworks on importance sampling, researchers often struggle to translate new sampling schemes computationally or benchmark against different schemes, in a manner that is reliable and maintainable. Moreover, most studies use computer programs lacking a convenient user interface or the flexibility to meet the current demands of open science. In particular, current computer frameworks can only evaluate the efficiency of a single importance sampling scheme or compare the efficiencies of different schemes in an ad hoc manner. Results. We have designed a general framework (http://coalescent.sourceforge.net; language: Java; License: GPLv3) for importance sampling that computes likelihoods under the standard neutral coalescent model of a single, well-mixed population of constant size over time following infinite sites model of mutation. The framework models the necessary core concepts, comes integrated with several data sets of varying size, implements the standard competing proposals, and integrates tightly with our previous framework for calculating exact probabilities. For a given dataset, it computes the likelihood and provides the maximum likelihood estimate of the mutation parameter. Well-known benchmarks in the coalescent literature validate the accuracy of the framework. The framework provides an intuitive user interface with minimal clutter. For performance, the framework switches automatically to modern multicore hardware, if available. It runs on three major platforms (Windows, Mac and Linux). Extensive tests and coverage make the framework reliable and maintainable. Conclusions. In coalescent theory, many studies of computational efficiency consider only effective sample size. Here, we evaluate proposals in the coalescent literature, to discover that the order of efficiency among the three importance sampling schemes changes when one considers running time as well as effective sample size. We also describe a computational technique called “just-in-time delegation” available to improve the trade-off between running time and precision by constructing improved importance sampling schemes from existing ones. Thus, our systems approach is a potential solution to the “2(8) programs problem” highlighted by Felsenstein, because it provides the flexibility to include or exclude various features of similar coalescent models or importance sampling schemes. PeerJ Inc. 2015-08-18 /pmc/articles/PMC4548476/ /pubmed/26312189 http://dx.doi.org/10.7717/peerj.1203 Text en http://creativecommons.org/publicdomain/zero/1.0/ This is an open access article, free of all copyright, made available under the Creative Commons Public Domain Dedication (http://creativecommons.org/publicdomain/zero/1.0/) . This work may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose.
spellingShingle Computational Biology
Tewari, Susanta
Spouge, John L.
Coalescent: an open-science framework for importance sampling in coalescent theory
title Coalescent: an open-science framework for importance sampling in coalescent theory
title_full Coalescent: an open-science framework for importance sampling in coalescent theory
title_fullStr Coalescent: an open-science framework for importance sampling in coalescent theory
title_full_unstemmed Coalescent: an open-science framework for importance sampling in coalescent theory
title_short Coalescent: an open-science framework for importance sampling in coalescent theory
title_sort coalescent: an open-science framework for importance sampling in coalescent theory
topic Computational Biology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4548476/
https://www.ncbi.nlm.nih.gov/pubmed/26312189
http://dx.doi.org/10.7717/peerj.1203
work_keys_str_mv AT tewarisusanta coalescentanopenscienceframeworkforimportancesamplingincoalescenttheory
AT spougejohnl coalescentanopenscienceframeworkforimportancesamplingincoalescenttheory