Cargando…
Coalescent: an open-science framework for importance sampling in coalescent theory
Background. In coalescent theory, computer programs often use importance sampling to calculate likelihoods and other statistical quantities. An importance sampling scheme can exploit human intuition to improve statistical efficiency of computations, but unfortunately, in the absence of general compu...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
PeerJ Inc.
2015
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4548476/ https://www.ncbi.nlm.nih.gov/pubmed/26312189 http://dx.doi.org/10.7717/peerj.1203 |
_version_ | 1782387199116836864 |
---|---|
author | Tewari, Susanta Spouge, John L. |
author_facet | Tewari, Susanta Spouge, John L. |
author_sort | Tewari, Susanta |
collection | PubMed |
description | Background. In coalescent theory, computer programs often use importance sampling to calculate likelihoods and other statistical quantities. An importance sampling scheme can exploit human intuition to improve statistical efficiency of computations, but unfortunately, in the absence of general computer frameworks on importance sampling, researchers often struggle to translate new sampling schemes computationally or benchmark against different schemes, in a manner that is reliable and maintainable. Moreover, most studies use computer programs lacking a convenient user interface or the flexibility to meet the current demands of open science. In particular, current computer frameworks can only evaluate the efficiency of a single importance sampling scheme or compare the efficiencies of different schemes in an ad hoc manner. Results. We have designed a general framework (http://coalescent.sourceforge.net; language: Java; License: GPLv3) for importance sampling that computes likelihoods under the standard neutral coalescent model of a single, well-mixed population of constant size over time following infinite sites model of mutation. The framework models the necessary core concepts, comes integrated with several data sets of varying size, implements the standard competing proposals, and integrates tightly with our previous framework for calculating exact probabilities. For a given dataset, it computes the likelihood and provides the maximum likelihood estimate of the mutation parameter. Well-known benchmarks in the coalescent literature validate the accuracy of the framework. The framework provides an intuitive user interface with minimal clutter. For performance, the framework switches automatically to modern multicore hardware, if available. It runs on three major platforms (Windows, Mac and Linux). Extensive tests and coverage make the framework reliable and maintainable. Conclusions. In coalescent theory, many studies of computational efficiency consider only effective sample size. Here, we evaluate proposals in the coalescent literature, to discover that the order of efficiency among the three importance sampling schemes changes when one considers running time as well as effective sample size. We also describe a computational technique called “just-in-time delegation” available to improve the trade-off between running time and precision by constructing improved importance sampling schemes from existing ones. Thus, our systems approach is a potential solution to the “2(8) programs problem” highlighted by Felsenstein, because it provides the flexibility to include or exclude various features of similar coalescent models or importance sampling schemes. |
format | Online Article Text |
id | pubmed-4548476 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2015 |
publisher | PeerJ Inc. |
record_format | MEDLINE/PubMed |
spelling | pubmed-45484762015-08-26 Coalescent: an open-science framework for importance sampling in coalescent theory Tewari, Susanta Spouge, John L. PeerJ Computational Biology Background. In coalescent theory, computer programs often use importance sampling to calculate likelihoods and other statistical quantities. An importance sampling scheme can exploit human intuition to improve statistical efficiency of computations, but unfortunately, in the absence of general computer frameworks on importance sampling, researchers often struggle to translate new sampling schemes computationally or benchmark against different schemes, in a manner that is reliable and maintainable. Moreover, most studies use computer programs lacking a convenient user interface or the flexibility to meet the current demands of open science. In particular, current computer frameworks can only evaluate the efficiency of a single importance sampling scheme or compare the efficiencies of different schemes in an ad hoc manner. Results. We have designed a general framework (http://coalescent.sourceforge.net; language: Java; License: GPLv3) for importance sampling that computes likelihoods under the standard neutral coalescent model of a single, well-mixed population of constant size over time following infinite sites model of mutation. The framework models the necessary core concepts, comes integrated with several data sets of varying size, implements the standard competing proposals, and integrates tightly with our previous framework for calculating exact probabilities. For a given dataset, it computes the likelihood and provides the maximum likelihood estimate of the mutation parameter. Well-known benchmarks in the coalescent literature validate the accuracy of the framework. The framework provides an intuitive user interface with minimal clutter. For performance, the framework switches automatically to modern multicore hardware, if available. It runs on three major platforms (Windows, Mac and Linux). Extensive tests and coverage make the framework reliable and maintainable. Conclusions. In coalescent theory, many studies of computational efficiency consider only effective sample size. Here, we evaluate proposals in the coalescent literature, to discover that the order of efficiency among the three importance sampling schemes changes when one considers running time as well as effective sample size. We also describe a computational technique called “just-in-time delegation” available to improve the trade-off between running time and precision by constructing improved importance sampling schemes from existing ones. Thus, our systems approach is a potential solution to the “2(8) programs problem” highlighted by Felsenstein, because it provides the flexibility to include or exclude various features of similar coalescent models or importance sampling schemes. PeerJ Inc. 2015-08-18 /pmc/articles/PMC4548476/ /pubmed/26312189 http://dx.doi.org/10.7717/peerj.1203 Text en http://creativecommons.org/publicdomain/zero/1.0/ This is an open access article, free of all copyright, made available under the Creative Commons Public Domain Dedication (http://creativecommons.org/publicdomain/zero/1.0/) . This work may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. |
spellingShingle | Computational Biology Tewari, Susanta Spouge, John L. Coalescent: an open-science framework for importance sampling in coalescent theory |
title | Coalescent: an open-science framework for importance sampling in coalescent theory |
title_full | Coalescent: an open-science framework for importance sampling in coalescent theory |
title_fullStr | Coalescent: an open-science framework for importance sampling in coalescent theory |
title_full_unstemmed | Coalescent: an open-science framework for importance sampling in coalescent theory |
title_short | Coalescent: an open-science framework for importance sampling in coalescent theory |
title_sort | coalescent: an open-science framework for importance sampling in coalescent theory |
topic | Computational Biology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4548476/ https://www.ncbi.nlm.nih.gov/pubmed/26312189 http://dx.doi.org/10.7717/peerj.1203 |
work_keys_str_mv | AT tewarisusanta coalescentanopenscienceframeworkforimportancesamplingincoalescenttheory AT spougejohnl coalescentanopenscienceframeworkforimportancesamplingincoalescenttheory |