Cargando…
Inferring Rates and Length-Distributions of Indels Using Approximate Bayesian Computation
The most common evolutionary events at the molecular level are single-base substitutions, as well as insertions and deletions (indels) of short DNA segments. A large body of research has been devoted to develop probabilistic substitution models and to infer their parameters using likelihood and Baye...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5438127/ https://www.ncbi.nlm.nih.gov/pubmed/28453624 http://dx.doi.org/10.1093/gbe/evx084 |
_version_ | 1783237708696518656 |
---|---|
author | Levy Karin, Eli Shkedy, Dafna Ashkenazy, Haim Cartwright, Reed A. Pupko, Tal |
author_facet | Levy Karin, Eli Shkedy, Dafna Ashkenazy, Haim Cartwright, Reed A. Pupko, Tal |
author_sort | Levy Karin, Eli |
collection | PubMed |
description | The most common evolutionary events at the molecular level are single-base substitutions, as well as insertions and deletions (indels) of short DNA segments. A large body of research has been devoted to develop probabilistic substitution models and to infer their parameters using likelihood and Bayesian approaches. In contrast, relatively little has been done to model indel dynamics, probably due to the difficulty in writing explicit likelihood functions. Here, we contribute to the effort of modeling indel dynamics by presenting SpartaABC, an approximate Bayesian computation (ABC) approach to infer indel parameters from sequence data (either aligned or unaligned). SpartaABC circumvents the need to use an explicit likelihood function by extracting summary statistics from simulated sequences. First, summary statistics are extracted from the input sequence data. Second, SpartaABC samples indel parameters from a prior distribution and uses them to simulate sequences. Third, it computes summary statistics from the simulated sets of sequences. By computing a distance between the summary statistics extracted from the input and each simulation, SpartaABC can provide an approximation to the posterior distribution of indel parameters as well as point estimates. We study the performance of our methodology and show that it provides accurate estimates of indel parameters in simulations. We next demonstrate the utility of SpartaABC by studying the impact of alignment errors on the inference of positive selection. A C ++ program implementing SpartaABC is freely available in http://spartaabc.tau.ac.il. |
format | Online Article Text |
id | pubmed-5438127 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-54381272017-05-24 Inferring Rates and Length-Distributions of Indels Using Approximate Bayesian Computation Levy Karin, Eli Shkedy, Dafna Ashkenazy, Haim Cartwright, Reed A. Pupko, Tal Genome Biol Evol Research Article The most common evolutionary events at the molecular level are single-base substitutions, as well as insertions and deletions (indels) of short DNA segments. A large body of research has been devoted to develop probabilistic substitution models and to infer their parameters using likelihood and Bayesian approaches. In contrast, relatively little has been done to model indel dynamics, probably due to the difficulty in writing explicit likelihood functions. Here, we contribute to the effort of modeling indel dynamics by presenting SpartaABC, an approximate Bayesian computation (ABC) approach to infer indel parameters from sequence data (either aligned or unaligned). SpartaABC circumvents the need to use an explicit likelihood function by extracting summary statistics from simulated sequences. First, summary statistics are extracted from the input sequence data. Second, SpartaABC samples indel parameters from a prior distribution and uses them to simulate sequences. Third, it computes summary statistics from the simulated sets of sequences. By computing a distance between the summary statistics extracted from the input and each simulation, SpartaABC can provide an approximation to the posterior distribution of indel parameters as well as point estimates. We study the performance of our methodology and show that it provides accurate estimates of indel parameters in simulations. We next demonstrate the utility of SpartaABC by studying the impact of alignment errors on the inference of positive selection. A C ++ program implementing SpartaABC is freely available in http://spartaabc.tau.ac.il. Oxford University Press 2017-05-01 /pmc/articles/PMC5438127/ /pubmed/28453624 http://dx.doi.org/10.1093/gbe/evx084 Text en © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Research Article Levy Karin, Eli Shkedy, Dafna Ashkenazy, Haim Cartwright, Reed A. Pupko, Tal Inferring Rates and Length-Distributions of Indels Using Approximate Bayesian Computation |
title | Inferring Rates and Length-Distributions of Indels Using Approximate Bayesian Computation |
title_full | Inferring Rates and Length-Distributions of Indels Using Approximate Bayesian Computation |
title_fullStr | Inferring Rates and Length-Distributions of Indels Using Approximate Bayesian Computation |
title_full_unstemmed | Inferring Rates and Length-Distributions of Indels Using Approximate Bayesian Computation |
title_short | Inferring Rates and Length-Distributions of Indels Using Approximate Bayesian Computation |
title_sort | inferring rates and length-distributions of indels using approximate bayesian computation |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5438127/ https://www.ncbi.nlm.nih.gov/pubmed/28453624 http://dx.doi.org/10.1093/gbe/evx084 |
work_keys_str_mv | AT levykarineli inferringratesandlengthdistributionsofindelsusingapproximatebayesiancomputation AT shkedydafna inferringratesandlengthdistributionsofindelsusingapproximatebayesiancomputation AT ashkenazyhaim inferringratesandlengthdistributionsofindelsusingapproximatebayesiancomputation AT cartwrightreeda inferringratesandlengthdistributionsofindelsusingapproximatebayesiancomputation AT pupkotal inferringratesandlengthdistributionsofindelsusingapproximatebayesiancomputation |