Cargando…

Deep sampling of gRNA in the human genome and deep-learning-informed prediction of gRNA activities

Life science studies involving clustered regularly interspaced short palindromic repeat (CRISPR) editing generally apply the best-performing guide RNA (gRNA) for a gene of interest. Computational models are combined with massive experimental quantification on synthetic gRNA-target libraries to accur...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Heng, Yan, Jianfeng, Lu, Zhike, Zhou, Yangfan, Zhang, Qingfeng, Cui, Tingting, Li, Yini, Chen, Hui, Ma, Lijia
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer Nature Singapore 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10188485/
https://www.ncbi.nlm.nih.gov/pubmed/37193681
http://dx.doi.org/10.1038/s41421-023-00549-9
_version_ 1785042923921866752
author Zhang, Heng
Yan, Jianfeng
Lu, Zhike
Zhou, Yangfan
Zhang, Qingfeng
Cui, Tingting
Li, Yini
Chen, Hui
Ma, Lijia
author_facet Zhang, Heng
Yan, Jianfeng
Lu, Zhike
Zhou, Yangfan
Zhang, Qingfeng
Cui, Tingting
Li, Yini
Chen, Hui
Ma, Lijia
author_sort Zhang, Heng
collection PubMed
description Life science studies involving clustered regularly interspaced short palindromic repeat (CRISPR) editing generally apply the best-performing guide RNA (gRNA) for a gene of interest. Computational models are combined with massive experimental quantification on synthetic gRNA-target libraries to accurately predict gRNA activity and mutational patterns. However, the measurements are inconsistent between studies due to differences in the designs of the gRNA-target pair constructs, and there has not yet been an integrated investigation that concurrently focuses on multiple facets of gRNA capacity. In this study, we analyzed the DNA double-strand break (DSB)-induced repair outcomes and measured SpCas9/gRNA activities at both matched and mismatched locations using 926,476 gRNAs covering 19,111 protein-coding genes and 20,268 non-coding genes. We developed machine learning models to forecast the on-target cleavage efficiency (AIdit_ON), off-target cleavage specificity (AIdit_OFF), and mutational profiles (AIdit_DSB) of SpCas9/gRNA from a uniformly collected and processed dataset by deep sampling and massively quantifying gRNA capabilities in K562 cells. Each of these models exhibited superlative performance in predicting SpCas9/gRNA activities on independent datasets when benchmarked with previous models. A previous unknown parameter was also empirically determined regarding the “sweet spot” in the size of datasets used to establish an effective model to predict gRNA capabilities at a manageable experimental scale. In addition, we observed cell type-specific mutational profiles and were able to link nucleotidylexotransferase as the key factor driving these outcomes. These massive datasets and deep learning algorithms have been implemented into the user-friendly web service http://crispr-aidit.com to evaluate and rank gRNAs for life science studies.
format Online
Article
Text
id pubmed-10188485
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Springer Nature Singapore
record_format MEDLINE/PubMed
spelling pubmed-101884852023-05-18 Deep sampling of gRNA in the human genome and deep-learning-informed prediction of gRNA activities Zhang, Heng Yan, Jianfeng Lu, Zhike Zhou, Yangfan Zhang, Qingfeng Cui, Tingting Li, Yini Chen, Hui Ma, Lijia Cell Discov Article Life science studies involving clustered regularly interspaced short palindromic repeat (CRISPR) editing generally apply the best-performing guide RNA (gRNA) for a gene of interest. Computational models are combined with massive experimental quantification on synthetic gRNA-target libraries to accurately predict gRNA activity and mutational patterns. However, the measurements are inconsistent between studies due to differences in the designs of the gRNA-target pair constructs, and there has not yet been an integrated investigation that concurrently focuses on multiple facets of gRNA capacity. In this study, we analyzed the DNA double-strand break (DSB)-induced repair outcomes and measured SpCas9/gRNA activities at both matched and mismatched locations using 926,476 gRNAs covering 19,111 protein-coding genes and 20,268 non-coding genes. We developed machine learning models to forecast the on-target cleavage efficiency (AIdit_ON), off-target cleavage specificity (AIdit_OFF), and mutational profiles (AIdit_DSB) of SpCas9/gRNA from a uniformly collected and processed dataset by deep sampling and massively quantifying gRNA capabilities in K562 cells. Each of these models exhibited superlative performance in predicting SpCas9/gRNA activities on independent datasets when benchmarked with previous models. A previous unknown parameter was also empirically determined regarding the “sweet spot” in the size of datasets used to establish an effective model to predict gRNA capabilities at a manageable experimental scale. In addition, we observed cell type-specific mutational profiles and were able to link nucleotidylexotransferase as the key factor driving these outcomes. These massive datasets and deep learning algorithms have been implemented into the user-friendly web service http://crispr-aidit.com to evaluate and rank gRNAs for life science studies. Springer Nature Singapore 2023-05-16 /pmc/articles/PMC10188485/ /pubmed/37193681 http://dx.doi.org/10.1038/s41421-023-00549-9 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Zhang, Heng
Yan, Jianfeng
Lu, Zhike
Zhou, Yangfan
Zhang, Qingfeng
Cui, Tingting
Li, Yini
Chen, Hui
Ma, Lijia
Deep sampling of gRNA in the human genome and deep-learning-informed prediction of gRNA activities
title Deep sampling of gRNA in the human genome and deep-learning-informed prediction of gRNA activities
title_full Deep sampling of gRNA in the human genome and deep-learning-informed prediction of gRNA activities
title_fullStr Deep sampling of gRNA in the human genome and deep-learning-informed prediction of gRNA activities
title_full_unstemmed Deep sampling of gRNA in the human genome and deep-learning-informed prediction of gRNA activities
title_short Deep sampling of gRNA in the human genome and deep-learning-informed prediction of gRNA activities
title_sort deep sampling of grna in the human genome and deep-learning-informed prediction of grna activities
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10188485/
https://www.ncbi.nlm.nih.gov/pubmed/37193681
http://dx.doi.org/10.1038/s41421-023-00549-9
work_keys_str_mv AT zhangheng deepsamplingofgrnainthehumangenomeanddeeplearninginformedpredictionofgrnaactivities
AT yanjianfeng deepsamplingofgrnainthehumangenomeanddeeplearninginformedpredictionofgrnaactivities
AT luzhike deepsamplingofgrnainthehumangenomeanddeeplearninginformedpredictionofgrnaactivities
AT zhouyangfan deepsamplingofgrnainthehumangenomeanddeeplearninginformedpredictionofgrnaactivities
AT zhangqingfeng deepsamplingofgrnainthehumangenomeanddeeplearninginformedpredictionofgrnaactivities
AT cuitingting deepsamplingofgrnainthehumangenomeanddeeplearninginformedpredictionofgrnaactivities
AT liyini deepsamplingofgrnainthehumangenomeanddeeplearninginformedpredictionofgrnaactivities
AT chenhui deepsamplingofgrnainthehumangenomeanddeeplearninginformedpredictionofgrnaactivities
AT malijia deepsamplingofgrnainthehumangenomeanddeeplearninginformedpredictionofgrnaactivities