Cargando…

CRISPRstrand: predicting repeat orientations to determine the crRNA-encoding strand at CRISPR loci

Motivation: The discovery of CRISPR-Cas systems almost 20 years ago rapidly changed our perception of the bacterial and archaeal immune systems. CRISPR loci consist of several repetitive DNA sequences called repeats, inter-spaced by stretches of variable length sequences called spacers. This CRISPR...

Descripción completa

Detalles Bibliográficos
Autores principales: Alkhnbashi, Omer S., Costa, Fabrizio, Shah, Shiraz A., Garrett, Roger A., Saunders, Sita J., Backofen, Rolf
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4147912/
https://www.ncbi.nlm.nih.gov/pubmed/25161238
http://dx.doi.org/10.1093/bioinformatics/btu459
_version_ 1782332536125390848
author Alkhnbashi, Omer S.
Costa, Fabrizio
Shah, Shiraz A.
Garrett, Roger A.
Saunders, Sita J.
Backofen, Rolf
author_facet Alkhnbashi, Omer S.
Costa, Fabrizio
Shah, Shiraz A.
Garrett, Roger A.
Saunders, Sita J.
Backofen, Rolf
author_sort Alkhnbashi, Omer S.
collection PubMed
description Motivation: The discovery of CRISPR-Cas systems almost 20 years ago rapidly changed our perception of the bacterial and archaeal immune systems. CRISPR loci consist of several repetitive DNA sequences called repeats, inter-spaced by stretches of variable length sequences called spacers. This CRISPR array is transcribed and processed into multiple mature RNA species (crRNAs). A single crRNA is integrated into an interference complex, together with CRISPR-associated (Cas) proteins, to bind and degrade invading nucleic acids. Although existing bioinformatics tools can recognize CRISPR loci by their characteristic repeat-spacer architecture, they generally output CRISPR arrays of ambiguous orientation and thus do not determine the strand from which crRNAs are processed. Knowledge of the correct orientation is crucial for many tasks, including the classification of CRISPR conservation, the detection of leader regions, the identification of target sites (protospacers) on invading genetic elements and the characterization of protospacer-adjacent motifs. Results: We present a fast and accurate tool to determine the crRNA-encoding strand at CRISPR loci by predicting the correct orientation of repeats based on an advanced machine learning approach. Both the repeat sequence and mutation information were encoded and processed by an efficient graph kernel to learn higher-order correlations. The model was trained and tested on curated data comprising >4500 CRISPRs and yielded a remarkable performance of 0.95 AUC ROC (area under the curve of the receiver operator characteristic). In addition, we show that accurate orientation information greatly improved detection of conserved repeat sequence families and structure motifs. We integrated CRISPRstrand predictions into our CRISPRmap web server of CRISPR conservation and updated the latter to version 2.0. Availability: CRISPRmap and CRISPRstrand are available at http://rna.informatik.uni-freiburg.de/CRISPRmap. Contact: backofen@informatik.uni-freiburg.de Supplementary information: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-4147912
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-41479122014-09-02 CRISPRstrand: predicting repeat orientations to determine the crRNA-encoding strand at CRISPR loci Alkhnbashi, Omer S. Costa, Fabrizio Shah, Shiraz A. Garrett, Roger A. Saunders, Sita J. Backofen, Rolf Bioinformatics Eccb 2014 Proceedings Papers Committee Motivation: The discovery of CRISPR-Cas systems almost 20 years ago rapidly changed our perception of the bacterial and archaeal immune systems. CRISPR loci consist of several repetitive DNA sequences called repeats, inter-spaced by stretches of variable length sequences called spacers. This CRISPR array is transcribed and processed into multiple mature RNA species (crRNAs). A single crRNA is integrated into an interference complex, together with CRISPR-associated (Cas) proteins, to bind and degrade invading nucleic acids. Although existing bioinformatics tools can recognize CRISPR loci by their characteristic repeat-spacer architecture, they generally output CRISPR arrays of ambiguous orientation and thus do not determine the strand from which crRNAs are processed. Knowledge of the correct orientation is crucial for many tasks, including the classification of CRISPR conservation, the detection of leader regions, the identification of target sites (protospacers) on invading genetic elements and the characterization of protospacer-adjacent motifs. Results: We present a fast and accurate tool to determine the crRNA-encoding strand at CRISPR loci by predicting the correct orientation of repeats based on an advanced machine learning approach. Both the repeat sequence and mutation information were encoded and processed by an efficient graph kernel to learn higher-order correlations. The model was trained and tested on curated data comprising >4500 CRISPRs and yielded a remarkable performance of 0.95 AUC ROC (area under the curve of the receiver operator characteristic). In addition, we show that accurate orientation information greatly improved detection of conserved repeat sequence families and structure motifs. We integrated CRISPRstrand predictions into our CRISPRmap web server of CRISPR conservation and updated the latter to version 2.0. Availability: CRISPRmap and CRISPRstrand are available at http://rna.informatik.uni-freiburg.de/CRISPRmap. Contact: backofen@informatik.uni-freiburg.de Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2014-09-01 2014-08-22 /pmc/articles/PMC4147912/ /pubmed/25161238 http://dx.doi.org/10.1093/bioinformatics/btu459 Text en © The Author 2014. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Eccb 2014 Proceedings Papers Committee
Alkhnbashi, Omer S.
Costa, Fabrizio
Shah, Shiraz A.
Garrett, Roger A.
Saunders, Sita J.
Backofen, Rolf
CRISPRstrand: predicting repeat orientations to determine the crRNA-encoding strand at CRISPR loci
title CRISPRstrand: predicting repeat orientations to determine the crRNA-encoding strand at CRISPR loci
title_full CRISPRstrand: predicting repeat orientations to determine the crRNA-encoding strand at CRISPR loci
title_fullStr CRISPRstrand: predicting repeat orientations to determine the crRNA-encoding strand at CRISPR loci
title_full_unstemmed CRISPRstrand: predicting repeat orientations to determine the crRNA-encoding strand at CRISPR loci
title_short CRISPRstrand: predicting repeat orientations to determine the crRNA-encoding strand at CRISPR loci
title_sort crisprstrand: predicting repeat orientations to determine the crrna-encoding strand at crispr loci
topic Eccb 2014 Proceedings Papers Committee
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4147912/
https://www.ncbi.nlm.nih.gov/pubmed/25161238
http://dx.doi.org/10.1093/bioinformatics/btu459
work_keys_str_mv AT alkhnbashiomers crisprstrandpredictingrepeatorientationstodeterminethecrrnaencodingstrandatcrisprloci
AT costafabrizio crisprstrandpredictingrepeatorientationstodeterminethecrrnaencodingstrandatcrisprloci
AT shahshiraza crisprstrandpredictingrepeatorientationstodeterminethecrrnaencodingstrandatcrisprloci
AT garrettrogera crisprstrandpredictingrepeatorientationstodeterminethecrrnaencodingstrandatcrisprloci
AT saunderssitaj crisprstrandpredictingrepeatorientationstodeterminethecrrnaencodingstrandatcrisprloci
AT backofenrolf crisprstrandpredictingrepeatorientationstodeterminethecrrnaencodingstrandatcrisprloci