Cargando…
Deep sampling of gRNA in the human genome and deep-learning-informed prediction of gRNA activities
Life science studies involving clustered regularly interspaced short palindromic repeat (CRISPR) editing generally apply the best-performing guide RNA (gRNA) for a gene of interest. Computational models are combined with massive experimental quantification on synthetic gRNA-target libraries to accur...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Springer Nature Singapore
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10188485/ https://www.ncbi.nlm.nih.gov/pubmed/37193681 http://dx.doi.org/10.1038/s41421-023-00549-9 |
_version_ | 1785042923921866752 |
---|---|
author | Zhang, Heng Yan, Jianfeng Lu, Zhike Zhou, Yangfan Zhang, Qingfeng Cui, Tingting Li, Yini Chen, Hui Ma, Lijia |
author_facet | Zhang, Heng Yan, Jianfeng Lu, Zhike Zhou, Yangfan Zhang, Qingfeng Cui, Tingting Li, Yini Chen, Hui Ma, Lijia |
author_sort | Zhang, Heng |
collection | PubMed |
description | Life science studies involving clustered regularly interspaced short palindromic repeat (CRISPR) editing generally apply the best-performing guide RNA (gRNA) for a gene of interest. Computational models are combined with massive experimental quantification on synthetic gRNA-target libraries to accurately predict gRNA activity and mutational patterns. However, the measurements are inconsistent between studies due to differences in the designs of the gRNA-target pair constructs, and there has not yet been an integrated investigation that concurrently focuses on multiple facets of gRNA capacity. In this study, we analyzed the DNA double-strand break (DSB)-induced repair outcomes and measured SpCas9/gRNA activities at both matched and mismatched locations using 926,476 gRNAs covering 19,111 protein-coding genes and 20,268 non-coding genes. We developed machine learning models to forecast the on-target cleavage efficiency (AIdit_ON), off-target cleavage specificity (AIdit_OFF), and mutational profiles (AIdit_DSB) of SpCas9/gRNA from a uniformly collected and processed dataset by deep sampling and massively quantifying gRNA capabilities in K562 cells. Each of these models exhibited superlative performance in predicting SpCas9/gRNA activities on independent datasets when benchmarked with previous models. A previous unknown parameter was also empirically determined regarding the “sweet spot” in the size of datasets used to establish an effective model to predict gRNA capabilities at a manageable experimental scale. In addition, we observed cell type-specific mutational profiles and were able to link nucleotidylexotransferase as the key factor driving these outcomes. These massive datasets and deep learning algorithms have been implemented into the user-friendly web service http://crispr-aidit.com to evaluate and rank gRNAs for life science studies. |
format | Online Article Text |
id | pubmed-10188485 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Springer Nature Singapore |
record_format | MEDLINE/PubMed |
spelling | pubmed-101884852023-05-18 Deep sampling of gRNA in the human genome and deep-learning-informed prediction of gRNA activities Zhang, Heng Yan, Jianfeng Lu, Zhike Zhou, Yangfan Zhang, Qingfeng Cui, Tingting Li, Yini Chen, Hui Ma, Lijia Cell Discov Article Life science studies involving clustered regularly interspaced short palindromic repeat (CRISPR) editing generally apply the best-performing guide RNA (gRNA) for a gene of interest. Computational models are combined with massive experimental quantification on synthetic gRNA-target libraries to accurately predict gRNA activity and mutational patterns. However, the measurements are inconsistent between studies due to differences in the designs of the gRNA-target pair constructs, and there has not yet been an integrated investigation that concurrently focuses on multiple facets of gRNA capacity. In this study, we analyzed the DNA double-strand break (DSB)-induced repair outcomes and measured SpCas9/gRNA activities at both matched and mismatched locations using 926,476 gRNAs covering 19,111 protein-coding genes and 20,268 non-coding genes. We developed machine learning models to forecast the on-target cleavage efficiency (AIdit_ON), off-target cleavage specificity (AIdit_OFF), and mutational profiles (AIdit_DSB) of SpCas9/gRNA from a uniformly collected and processed dataset by deep sampling and massively quantifying gRNA capabilities in K562 cells. Each of these models exhibited superlative performance in predicting SpCas9/gRNA activities on independent datasets when benchmarked with previous models. A previous unknown parameter was also empirically determined regarding the “sweet spot” in the size of datasets used to establish an effective model to predict gRNA capabilities at a manageable experimental scale. In addition, we observed cell type-specific mutational profiles and were able to link nucleotidylexotransferase as the key factor driving these outcomes. These massive datasets and deep learning algorithms have been implemented into the user-friendly web service http://crispr-aidit.com to evaluate and rank gRNAs for life science studies. Springer Nature Singapore 2023-05-16 /pmc/articles/PMC10188485/ /pubmed/37193681 http://dx.doi.org/10.1038/s41421-023-00549-9 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Zhang, Heng Yan, Jianfeng Lu, Zhike Zhou, Yangfan Zhang, Qingfeng Cui, Tingting Li, Yini Chen, Hui Ma, Lijia Deep sampling of gRNA in the human genome and deep-learning-informed prediction of gRNA activities |
title | Deep sampling of gRNA in the human genome and deep-learning-informed prediction of gRNA activities |
title_full | Deep sampling of gRNA in the human genome and deep-learning-informed prediction of gRNA activities |
title_fullStr | Deep sampling of gRNA in the human genome and deep-learning-informed prediction of gRNA activities |
title_full_unstemmed | Deep sampling of gRNA in the human genome and deep-learning-informed prediction of gRNA activities |
title_short | Deep sampling of gRNA in the human genome and deep-learning-informed prediction of gRNA activities |
title_sort | deep sampling of grna in the human genome and deep-learning-informed prediction of grna activities |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10188485/ https://www.ncbi.nlm.nih.gov/pubmed/37193681 http://dx.doi.org/10.1038/s41421-023-00549-9 |
work_keys_str_mv | AT zhangheng deepsamplingofgrnainthehumangenomeanddeeplearninginformedpredictionofgrnaactivities AT yanjianfeng deepsamplingofgrnainthehumangenomeanddeeplearninginformedpredictionofgrnaactivities AT luzhike deepsamplingofgrnainthehumangenomeanddeeplearninginformedpredictionofgrnaactivities AT zhouyangfan deepsamplingofgrnainthehumangenomeanddeeplearninginformedpredictionofgrnaactivities AT zhangqingfeng deepsamplingofgrnainthehumangenomeanddeeplearninginformedpredictionofgrnaactivities AT cuitingting deepsamplingofgrnainthehumangenomeanddeeplearninginformedpredictionofgrnaactivities AT liyini deepsamplingofgrnainthehumangenomeanddeeplearninginformedpredictionofgrnaactivities AT chenhui deepsamplingofgrnainthehumangenomeanddeeplearninginformedpredictionofgrnaactivities AT malijia deepsamplingofgrnainthehumangenomeanddeeplearninginformedpredictionofgrnaactivities |