Cargando…

Casboundary: automated definition of integral Cas cassettes

MOTIVATION: CRISPR-Cas are important systems found in most archaeal and many bacterial genomes, providing adaptive immunity against mobile genetic elements in prokaryotes. The CRISPR-Cas systems are encoded by a set of consecutive cas genes, here termed cassette. The identification of cassette bound...

Descripción completa

Detalles Bibliográficos
Autores principales: Padilha, Victor A, Alkhnbashi, Omer S, Tran, Van Dinh, Shah, Shiraz A, Carvalho, André C P L F, Backofen, Rolf
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8208735/
https://www.ncbi.nlm.nih.gov/pubmed/33226067
http://dx.doi.org/10.1093/bioinformatics/btaa984
_version_ 1783708981449981952
author Padilha, Victor A
Alkhnbashi, Omer S
Tran, Van Dinh
Shah, Shiraz A
Carvalho, André C P L F
Backofen, Rolf
author_facet Padilha, Victor A
Alkhnbashi, Omer S
Tran, Van Dinh
Shah, Shiraz A
Carvalho, André C P L F
Backofen, Rolf
author_sort Padilha, Victor A
collection PubMed
description MOTIVATION: CRISPR-Cas are important systems found in most archaeal and many bacterial genomes, providing adaptive immunity against mobile genetic elements in prokaryotes. The CRISPR-Cas systems are encoded by a set of consecutive cas genes, here termed cassette. The identification of cassette boundaries is key for finding cassettes in CRISPR research field. This is often carried out by using Hidden Markov Models and manual annotation. In this article, we propose the first method able to automatically define the cassette boundaries. In addition, we present a Cas-type predictive model used by the method to assign each gene located in the region defined by a cassette’s boundaries a Cas label from a set of pre-defined Cas types. Furthermore, the proposed method can detect potentially new cas genes and decompose a cassette into its modules. RESULTS: We evaluate the predictive performance of our proposed method on data collected from the two most recent CRISPR classification studies. In our experiments, we obtain an average similarity of 0.86 between the predicted and expected cassettes. Besides, we achieve F-scores above 0.9 for the classification of cas genes of known types and 0.73 for the unknown ones. Finally, we conduct two additional study cases, where we investigate the occurrence of potentially new cas genes and the occurrence of module exchange between different genomes. AVAILABILITY AND IMPLEMENTATION: https://github.com/BackofenLab/Casboundary. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-8208735
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-82087352021-06-17 Casboundary: automated definition of integral Cas cassettes Padilha, Victor A Alkhnbashi, Omer S Tran, Van Dinh Shah, Shiraz A Carvalho, André C P L F Backofen, Rolf Bioinformatics Original Papers MOTIVATION: CRISPR-Cas are important systems found in most archaeal and many bacterial genomes, providing adaptive immunity against mobile genetic elements in prokaryotes. The CRISPR-Cas systems are encoded by a set of consecutive cas genes, here termed cassette. The identification of cassette boundaries is key for finding cassettes in CRISPR research field. This is often carried out by using Hidden Markov Models and manual annotation. In this article, we propose the first method able to automatically define the cassette boundaries. In addition, we present a Cas-type predictive model used by the method to assign each gene located in the region defined by a cassette’s boundaries a Cas label from a set of pre-defined Cas types. Furthermore, the proposed method can detect potentially new cas genes and decompose a cassette into its modules. RESULTS: We evaluate the predictive performance of our proposed method on data collected from the two most recent CRISPR classification studies. In our experiments, we obtain an average similarity of 0.86 between the predicted and expected cassettes. Besides, we achieve F-scores above 0.9 for the classification of cas genes of known types and 0.73 for the unknown ones. Finally, we conduct two additional study cases, where we investigate the occurrence of potentially new cas genes and the occurrence of module exchange between different genomes. AVAILABILITY AND IMPLEMENTATION: https://github.com/BackofenLab/Casboundary. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2020-12-06 /pmc/articles/PMC8208735/ /pubmed/33226067 http://dx.doi.org/10.1093/bioinformatics/btaa984 Text en © The Author(s) 2020. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Padilha, Victor A
Alkhnbashi, Omer S
Tran, Van Dinh
Shah, Shiraz A
Carvalho, André C P L F
Backofen, Rolf
Casboundary: automated definition of integral Cas cassettes
title Casboundary: automated definition of integral Cas cassettes
title_full Casboundary: automated definition of integral Cas cassettes
title_fullStr Casboundary: automated definition of integral Cas cassettes
title_full_unstemmed Casboundary: automated definition of integral Cas cassettes
title_short Casboundary: automated definition of integral Cas cassettes
title_sort casboundary: automated definition of integral cas cassettes
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8208735/
https://www.ncbi.nlm.nih.gov/pubmed/33226067
http://dx.doi.org/10.1093/bioinformatics/btaa984
work_keys_str_mv AT padilhavictora casboundaryautomateddefinitionofintegralcascassettes
AT alkhnbashiomers casboundaryautomateddefinitionofintegralcascassettes
AT tranvandinh casboundaryautomateddefinitionofintegralcascassettes
AT shahshiraza casboundaryautomateddefinitionofintegralcascassettes
AT carvalhoandrecplf casboundaryautomateddefinitionofintegralcascassettes
AT backofenrolf casboundaryautomateddefinitionofintegralcascassettes