Cargando…

Probabilistic models for CRISPR spacer content evolution

BACKGROUND: The CRISPR/Cas system is known to act as an adaptive and heritable immune system in Eubacteria and Archaea. Immunity is encoded in an array of spacer sequences. Each spacer can provide specific immunity to invasive elements that carry the same or a similar sequence. Even in closely relat...

Descripción completa

Detalles Bibliográficos
Autores principales: Kupczok, Anne, Bollback, Jonathan P
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3704272/
https://www.ncbi.nlm.nih.gov/pubmed/23442002
http://dx.doi.org/10.1186/1471-2148-13-54
_version_ 1782275976719237120
author Kupczok, Anne
Bollback, Jonathan P
author_facet Kupczok, Anne
Bollback, Jonathan P
author_sort Kupczok, Anne
collection PubMed
description BACKGROUND: The CRISPR/Cas system is known to act as an adaptive and heritable immune system in Eubacteria and Archaea. Immunity is encoded in an array of spacer sequences. Each spacer can provide specific immunity to invasive elements that carry the same or a similar sequence. Even in closely related strains, spacer content is very dynamic and evolves quickly. Standard models of nucleotide evolution cannot be applied to quantify its rate of change since processes other than single nucleotide changes determine its evolution. METHODS: We present probabilistic models that are specific for spacer content evolution. They account for the different processes of insertion and deletion. Insertions can be constrained to occur on one end only or are allowed to occur throughout the array. One deletion event can affect one spacer or a whole fragment of adjacent spacers. Parameters of the underlying models are estimated for a pair of arrays by maximum likelihood using explicit ancestor enumeration. RESULTS: Simulations show that parameters are well estimated on average under the models presented here. There is a bias in the rate estimation when including fragment deletions. The models also estimate times between pairs of strains. But with increasing time, spacer overlap goes to zero, and thus there is an upper bound on the distance that can be estimated. Spacer content similarities are displayed in a distance based phylogeny using the estimated times. We use the presented models to analyze different Yersinia pestis data sets and find that the results among them are largely congruent. The models also capture the variation in diversity of spacers among the data sets. A comparison of spacer-based phylogenies and Cas gene phylogenies shows that they resolve very different time scales for this data set. CONCLUSIONS: The simulations and data analyses show that the presented models are useful for quantifying spacer content evolution and for displaying spacer content similarities of closely related strains in a phylogeny. This allows for comparisons of different CRISPR arrays or for comparisons between CRISPR arrays and nucleotide substitution rates.
format Online
Article
Text
id pubmed-3704272
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-37042722013-07-12 Probabilistic models for CRISPR spacer content evolution Kupczok, Anne Bollback, Jonathan P BMC Evol Biol Methodology Article BACKGROUND: The CRISPR/Cas system is known to act as an adaptive and heritable immune system in Eubacteria and Archaea. Immunity is encoded in an array of spacer sequences. Each spacer can provide specific immunity to invasive elements that carry the same or a similar sequence. Even in closely related strains, spacer content is very dynamic and evolves quickly. Standard models of nucleotide evolution cannot be applied to quantify its rate of change since processes other than single nucleotide changes determine its evolution. METHODS: We present probabilistic models that are specific for spacer content evolution. They account for the different processes of insertion and deletion. Insertions can be constrained to occur on one end only or are allowed to occur throughout the array. One deletion event can affect one spacer or a whole fragment of adjacent spacers. Parameters of the underlying models are estimated for a pair of arrays by maximum likelihood using explicit ancestor enumeration. RESULTS: Simulations show that parameters are well estimated on average under the models presented here. There is a bias in the rate estimation when including fragment deletions. The models also estimate times between pairs of strains. But with increasing time, spacer overlap goes to zero, and thus there is an upper bound on the distance that can be estimated. Spacer content similarities are displayed in a distance based phylogeny using the estimated times. We use the presented models to analyze different Yersinia pestis data sets and find that the results among them are largely congruent. The models also capture the variation in diversity of spacers among the data sets. A comparison of spacer-based phylogenies and Cas gene phylogenies shows that they resolve very different time scales for this data set. CONCLUSIONS: The simulations and data analyses show that the presented models are useful for quantifying spacer content evolution and for displaying spacer content similarities of closely related strains in a phylogeny. This allows for comparisons of different CRISPR arrays or for comparisons between CRISPR arrays and nucleotide substitution rates. BioMed Central 2013-02-26 /pmc/articles/PMC3704272/ /pubmed/23442002 http://dx.doi.org/10.1186/1471-2148-13-54 Text en Copyright © 2013 Kupczok and Bollback; licensee BioMed Central Ltd. http://www.creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://www.creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Kupczok, Anne
Bollback, Jonathan P
Probabilistic models for CRISPR spacer content evolution
title Probabilistic models for CRISPR spacer content evolution
title_full Probabilistic models for CRISPR spacer content evolution
title_fullStr Probabilistic models for CRISPR spacer content evolution
title_full_unstemmed Probabilistic models for CRISPR spacer content evolution
title_short Probabilistic models for CRISPR spacer content evolution
title_sort probabilistic models for crispr spacer content evolution
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3704272/
https://www.ncbi.nlm.nih.gov/pubmed/23442002
http://dx.doi.org/10.1186/1471-2148-13-54
work_keys_str_mv AT kupczokanne probabilisticmodelsforcrisprspacercontentevolution
AT bollbackjonathanp probabilisticmodelsforcrisprspacercontentevolution