Cargando…

Inferring Rates and Length-Distributions of Indels Using Approximate Bayesian Computation

The most common evolutionary events at the molecular level are single-base substitutions, as well as insertions and deletions (indels) of short DNA segments. A large body of research has been devoted to develop probabilistic substitution models and to infer their parameters using likelihood and Baye...

Descripción completa

Detalles Bibliográficos
Autores principales: Levy Karin, Eli, Shkedy, Dafna, Ashkenazy, Haim, Cartwright, Reed A., Pupko, Tal
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5438127/
https://www.ncbi.nlm.nih.gov/pubmed/28453624
http://dx.doi.org/10.1093/gbe/evx084
_version_ 1783237708696518656
author Levy Karin, Eli
Shkedy, Dafna
Ashkenazy, Haim
Cartwright, Reed A.
Pupko, Tal
author_facet Levy Karin, Eli
Shkedy, Dafna
Ashkenazy, Haim
Cartwright, Reed A.
Pupko, Tal
author_sort Levy Karin, Eli
collection PubMed
description The most common evolutionary events at the molecular level are single-base substitutions, as well as insertions and deletions (indels) of short DNA segments. A large body of research has been devoted to develop probabilistic substitution models and to infer their parameters using likelihood and Bayesian approaches. In contrast, relatively little has been done to model indel dynamics, probably due to the difficulty in writing explicit likelihood functions. Here, we contribute to the effort of modeling indel dynamics by presenting SpartaABC, an approximate Bayesian computation (ABC) approach to infer indel parameters from sequence data (either aligned or unaligned). SpartaABC circumvents the need to use an explicit likelihood function by extracting summary statistics from simulated sequences. First, summary statistics are extracted from the input sequence data. Second, SpartaABC samples indel parameters from a prior distribution and uses them to simulate sequences. Third, it computes summary statistics from the simulated sets of sequences. By computing a distance between the summary statistics extracted from the input and each simulation, SpartaABC can provide an approximation to the posterior distribution of indel parameters as well as point estimates. We study the performance of our methodology and show that it provides accurate estimates of indel parameters in simulations. We next demonstrate the utility of SpartaABC by studying the impact of alignment errors on the inference of positive selection. A C ++ program implementing SpartaABC is freely available in http://spartaabc.tau.ac.il.
format Online
Article
Text
id pubmed-5438127
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-54381272017-05-24 Inferring Rates and Length-Distributions of Indels Using Approximate Bayesian Computation Levy Karin, Eli Shkedy, Dafna Ashkenazy, Haim Cartwright, Reed A. Pupko, Tal Genome Biol Evol Research Article The most common evolutionary events at the molecular level are single-base substitutions, as well as insertions and deletions (indels) of short DNA segments. A large body of research has been devoted to develop probabilistic substitution models and to infer their parameters using likelihood and Bayesian approaches. In contrast, relatively little has been done to model indel dynamics, probably due to the difficulty in writing explicit likelihood functions. Here, we contribute to the effort of modeling indel dynamics by presenting SpartaABC, an approximate Bayesian computation (ABC) approach to infer indel parameters from sequence data (either aligned or unaligned). SpartaABC circumvents the need to use an explicit likelihood function by extracting summary statistics from simulated sequences. First, summary statistics are extracted from the input sequence data. Second, SpartaABC samples indel parameters from a prior distribution and uses them to simulate sequences. Third, it computes summary statistics from the simulated sets of sequences. By computing a distance between the summary statistics extracted from the input and each simulation, SpartaABC can provide an approximation to the posterior distribution of indel parameters as well as point estimates. We study the performance of our methodology and show that it provides accurate estimates of indel parameters in simulations. We next demonstrate the utility of SpartaABC by studying the impact of alignment errors on the inference of positive selection. A C ++ program implementing SpartaABC is freely available in http://spartaabc.tau.ac.il. Oxford University Press 2017-05-01 /pmc/articles/PMC5438127/ /pubmed/28453624 http://dx.doi.org/10.1093/gbe/evx084 Text en © The Author 2017. Published by Oxford University Press on behalf of the Society for Molecular Biology and Evolution. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Research Article
Levy Karin, Eli
Shkedy, Dafna
Ashkenazy, Haim
Cartwright, Reed A.
Pupko, Tal
Inferring Rates and Length-Distributions of Indels Using Approximate Bayesian Computation
title Inferring Rates and Length-Distributions of Indels Using Approximate Bayesian Computation
title_full Inferring Rates and Length-Distributions of Indels Using Approximate Bayesian Computation
title_fullStr Inferring Rates and Length-Distributions of Indels Using Approximate Bayesian Computation
title_full_unstemmed Inferring Rates and Length-Distributions of Indels Using Approximate Bayesian Computation
title_short Inferring Rates and Length-Distributions of Indels Using Approximate Bayesian Computation
title_sort inferring rates and length-distributions of indels using approximate bayesian computation
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5438127/
https://www.ncbi.nlm.nih.gov/pubmed/28453624
http://dx.doi.org/10.1093/gbe/evx084
work_keys_str_mv AT levykarineli inferringratesandlengthdistributionsofindelsusingapproximatebayesiancomputation
AT shkedydafna inferringratesandlengthdistributionsofindelsusingapproximatebayesiancomputation
AT ashkenazyhaim inferringratesandlengthdistributionsofindelsusingapproximatebayesiancomputation
AT cartwrightreeda inferringratesandlengthdistributionsofindelsusingapproximatebayesiancomputation
AT pupkotal inferringratesandlengthdistributionsofindelsusingapproximatebayesiancomputation