Cargando…

CASowary: CRISPR-Cas13 guide RNA predictor for transcript depletion

BACKGROUND: Recent discovery of the gene editing system - CRISPR (Clustered Regularly Interspersed Short Palindromic Repeats) associated proteins (Cas), has resulted in its widespread use for improved understanding of a variety of biological systems. Cas13, a lesser studied Cas protein, has been rep...

Descripción completa

Detalles Bibliográficos
Autores principales: Krohannon, Alexander, Srivastava, Mansi, Rauch, Simone, Srivastava, Rajneesh, Dickinson, Bryan C., Janga, Sarath Chandra
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8889671/
https://www.ncbi.nlm.nih.gov/pubmed/35236300
http://dx.doi.org/10.1186/s12864-022-08366-2
_version_ 1784661455142912000
author Krohannon, Alexander
Srivastava, Mansi
Rauch, Simone
Srivastava, Rajneesh
Dickinson, Bryan C.
Janga, Sarath Chandra
author_facet Krohannon, Alexander
Srivastava, Mansi
Rauch, Simone
Srivastava, Rajneesh
Dickinson, Bryan C.
Janga, Sarath Chandra
author_sort Krohannon, Alexander
collection PubMed
description BACKGROUND: Recent discovery of the gene editing system - CRISPR (Clustered Regularly Interspersed Short Palindromic Repeats) associated proteins (Cas), has resulted in its widespread use for improved understanding of a variety of biological systems. Cas13, a lesser studied Cas protein, has been repurposed to allow for efficient and precise editing of RNA molecules. The Cas13 system utilizes base complementarity between a crRNA/sgRNA (crispr RNA or single guide RNA) and a target RNA transcript, to preferentially bind to only the target transcript. Unlike targeting the upstream regulatory regions of protein coding genes on the genome, the transcriptome is significantly more redundant, leading to many transcripts having wide stretches of identical nucleotide sequences. Transcripts also exhibit complex three-dimensional structures and interact with an array of RBPs (RNA Binding Proteins), both of which may impact the effectiveness of transcript depletion of target sequences. However, our understanding of the features and corresponding methods which can predict whether a specific sgRNA will effectively knockdown a transcript is very limited. RESULTS: Here we present a novel machine learning and computational tool, CASowary, to predict the efficacy of a sgRNA. We used publicly available RNA knockdown data from Cas13 characterization experiments for 555 sgRNAs targeting the transcriptome in HEK293 cells, in conjunction with transcriptome-wide protein occupancy information. Our model utilizes a Decision Tree architecture with a set of 112 sequence and target availability features, to classify sgRNA efficacy into one of four classes, based upon expected level of target transcript knockdown. After accounting for noise in the training data set, the noise-normalized accuracy exceeds 70%. Additionally, highly effective sgRNA predictions have been experimentally validated using an independent RNA targeting Cas system – CIRTS, confirming the robustness and reproducibility of our model’s sgRNA predictions. Utilizing transcriptome wide protein occupancy map generated using POP-seq in HeLa cells against publicly available protein-RNA interaction map in Hek293 cells, we show that CASowary can predict high quality guides for numerous transcripts in a cell line specific manner. CONCLUSIONS: Application of CASowary to whole transcriptomes should enable rapid deployment of CRISPR/Cas13 systems, facilitating the development of therapeutic interventions linked with aberrations in RNA regulatory processes. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12864-022-08366-2.
format Online
Article
Text
id pubmed-8889671
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-88896712022-03-09 CASowary: CRISPR-Cas13 guide RNA predictor for transcript depletion Krohannon, Alexander Srivastava, Mansi Rauch, Simone Srivastava, Rajneesh Dickinson, Bryan C. Janga, Sarath Chandra BMC Genomics Software BACKGROUND: Recent discovery of the gene editing system - CRISPR (Clustered Regularly Interspersed Short Palindromic Repeats) associated proteins (Cas), has resulted in its widespread use for improved understanding of a variety of biological systems. Cas13, a lesser studied Cas protein, has been repurposed to allow for efficient and precise editing of RNA molecules. The Cas13 system utilizes base complementarity between a crRNA/sgRNA (crispr RNA or single guide RNA) and a target RNA transcript, to preferentially bind to only the target transcript. Unlike targeting the upstream regulatory regions of protein coding genes on the genome, the transcriptome is significantly more redundant, leading to many transcripts having wide stretches of identical nucleotide sequences. Transcripts also exhibit complex three-dimensional structures and interact with an array of RBPs (RNA Binding Proteins), both of which may impact the effectiveness of transcript depletion of target sequences. However, our understanding of the features and corresponding methods which can predict whether a specific sgRNA will effectively knockdown a transcript is very limited. RESULTS: Here we present a novel machine learning and computational tool, CASowary, to predict the efficacy of a sgRNA. We used publicly available RNA knockdown data from Cas13 characterization experiments for 555 sgRNAs targeting the transcriptome in HEK293 cells, in conjunction with transcriptome-wide protein occupancy information. Our model utilizes a Decision Tree architecture with a set of 112 sequence and target availability features, to classify sgRNA efficacy into one of four classes, based upon expected level of target transcript knockdown. After accounting for noise in the training data set, the noise-normalized accuracy exceeds 70%. Additionally, highly effective sgRNA predictions have been experimentally validated using an independent RNA targeting Cas system – CIRTS, confirming the robustness and reproducibility of our model’s sgRNA predictions. Utilizing transcriptome wide protein occupancy map generated using POP-seq in HeLa cells against publicly available protein-RNA interaction map in Hek293 cells, we show that CASowary can predict high quality guides for numerous transcripts in a cell line specific manner. CONCLUSIONS: Application of CASowary to whole transcriptomes should enable rapid deployment of CRISPR/Cas13 systems, facilitating the development of therapeutic interventions linked with aberrations in RNA regulatory processes. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12864-022-08366-2. BioMed Central 2022-03-02 /pmc/articles/PMC8889671/ /pubmed/35236300 http://dx.doi.org/10.1186/s12864-022-08366-2 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Software
Krohannon, Alexander
Srivastava, Mansi
Rauch, Simone
Srivastava, Rajneesh
Dickinson, Bryan C.
Janga, Sarath Chandra
CASowary: CRISPR-Cas13 guide RNA predictor for transcript depletion
title CASowary: CRISPR-Cas13 guide RNA predictor for transcript depletion
title_full CASowary: CRISPR-Cas13 guide RNA predictor for transcript depletion
title_fullStr CASowary: CRISPR-Cas13 guide RNA predictor for transcript depletion
title_full_unstemmed CASowary: CRISPR-Cas13 guide RNA predictor for transcript depletion
title_short CASowary: CRISPR-Cas13 guide RNA predictor for transcript depletion
title_sort casowary: crispr-cas13 guide rna predictor for transcript depletion
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8889671/
https://www.ncbi.nlm.nih.gov/pubmed/35236300
http://dx.doi.org/10.1186/s12864-022-08366-2
work_keys_str_mv AT krohannonalexander casowarycrisprcas13guidernapredictorfortranscriptdepletion
AT srivastavamansi casowarycrisprcas13guidernapredictorfortranscriptdepletion
AT rauchsimone casowarycrisprcas13guidernapredictorfortranscriptdepletion
AT srivastavarajneesh casowarycrisprcas13guidernapredictorfortranscriptdepletion
AT dickinsonbryanc casowarycrisprcas13guidernapredictorfortranscriptdepletion
AT jangasarathchandra casowarycrisprcas13guidernapredictorfortranscriptdepletion