Cargando…

Searching for fat tails in CRISPR-Cas systems: Data analysis and mathematical modeling

Understanding CRISPR-Cas systems—the adaptive defence mechanism that about half of bacterial species and most of archaea use to neutralise viral attacks—is important for explaining the biodiversity observed in the microbial world as well as for editing animal and plant genomes effectively. The CRISP...

Descripción completa

Detalles Bibliográficos
Autores principales: Pavlova, Yekaterina S., Paez-Espino, David, Morozov, Andrew Yu., Belalov, Ilya S.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8026048/
https://www.ncbi.nlm.nih.gov/pubmed/33770071
http://dx.doi.org/10.1371/journal.pcbi.1008841
_version_ 1783675601342693376
author Pavlova, Yekaterina S.
Paez-Espino, David
Morozov, Andrew Yu.
Belalov, Ilya S.
author_facet Pavlova, Yekaterina S.
Paez-Espino, David
Morozov, Andrew Yu.
Belalov, Ilya S.
author_sort Pavlova, Yekaterina S.
collection PubMed
description Understanding CRISPR-Cas systems—the adaptive defence mechanism that about half of bacterial species and most of archaea use to neutralise viral attacks—is important for explaining the biodiversity observed in the microbial world as well as for editing animal and plant genomes effectively. The CRISPR-Cas system learns from previous viral infections and integrates small pieces from phage genomes called spacers into the microbial genome. The resulting library of spacers collected in CRISPR arrays is then compared with the DNA of potential invaders. One of the most intriguing and least well understood questions about CRISPR-Cas systems is the distribution of spacers across the microbial population. Here, using empirical data, we show that the global distribution of spacer numbers in CRISPR arrays across multiple biomes worldwide typically exhibits scale-invariant power law behaviour, and the standard deviation is greater than the sample mean. We develop a mathematical model of spacer loss and acquisition dynamics which fits observed data from almost four thousand metagenomes well. In analogy to the classical ‘rich-get-richer’ mechanism of power law emergence, the rate of spacer acquisition is proportional to the CRISPR array size, which allows a small proportion of CRISPRs within the population to possess a significant number of spacers. Our study provides an alternative explanation for the rarity of all-resistant super microbes in nature and why proliferation of phages can be highly successful despite the effectiveness of CRISPR-Cas systems.
format Online
Article
Text
id pubmed-8026048
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-80260482021-04-15 Searching for fat tails in CRISPR-Cas systems: Data analysis and mathematical modeling Pavlova, Yekaterina S. Paez-Espino, David Morozov, Andrew Yu. Belalov, Ilya S. PLoS Comput Biol Research Article Understanding CRISPR-Cas systems—the adaptive defence mechanism that about half of bacterial species and most of archaea use to neutralise viral attacks—is important for explaining the biodiversity observed in the microbial world as well as for editing animal and plant genomes effectively. The CRISPR-Cas system learns from previous viral infections and integrates small pieces from phage genomes called spacers into the microbial genome. The resulting library of spacers collected in CRISPR arrays is then compared with the DNA of potential invaders. One of the most intriguing and least well understood questions about CRISPR-Cas systems is the distribution of spacers across the microbial population. Here, using empirical data, we show that the global distribution of spacer numbers in CRISPR arrays across multiple biomes worldwide typically exhibits scale-invariant power law behaviour, and the standard deviation is greater than the sample mean. We develop a mathematical model of spacer loss and acquisition dynamics which fits observed data from almost four thousand metagenomes well. In analogy to the classical ‘rich-get-richer’ mechanism of power law emergence, the rate of spacer acquisition is proportional to the CRISPR array size, which allows a small proportion of CRISPRs within the population to possess a significant number of spacers. Our study provides an alternative explanation for the rarity of all-resistant super microbes in nature and why proliferation of phages can be highly successful despite the effectiveness of CRISPR-Cas systems. Public Library of Science 2021-03-26 /pmc/articles/PMC8026048/ /pubmed/33770071 http://dx.doi.org/10.1371/journal.pcbi.1008841 Text en https://creativecommons.org/publicdomain/zero/1.0/This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 (https://creativecommons.org/publicdomain/zero/1.0/) public domain dedication.
spellingShingle Research Article
Pavlova, Yekaterina S.
Paez-Espino, David
Morozov, Andrew Yu.
Belalov, Ilya S.
Searching for fat tails in CRISPR-Cas systems: Data analysis and mathematical modeling
title Searching for fat tails in CRISPR-Cas systems: Data analysis and mathematical modeling
title_full Searching for fat tails in CRISPR-Cas systems: Data analysis and mathematical modeling
title_fullStr Searching for fat tails in CRISPR-Cas systems: Data analysis and mathematical modeling
title_full_unstemmed Searching for fat tails in CRISPR-Cas systems: Data analysis and mathematical modeling
title_short Searching for fat tails in CRISPR-Cas systems: Data analysis and mathematical modeling
title_sort searching for fat tails in crispr-cas systems: data analysis and mathematical modeling
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8026048/
https://www.ncbi.nlm.nih.gov/pubmed/33770071
http://dx.doi.org/10.1371/journal.pcbi.1008841
work_keys_str_mv AT pavlovayekaterinas searchingforfattailsincrisprcassystemsdataanalysisandmathematicalmodeling
AT paezespinodavid searchingforfattailsincrisprcassystemsdataanalysisandmathematicalmodeling
AT morozovandrewyu searchingforfattailsincrisprcassystemsdataanalysisandmathematicalmodeling
AT belalovilyas searchingforfattailsincrisprcassystemsdataanalysisandmathematicalmodeling