Cargando…

AttCRISPR: a spacetime interpretable model for prediction of sgRNA on-target activity

BACKGROUND: More and more Cas9 variants with higher specificity are developed to avoid the off-target effect, which brings a significant volume of experimental data. Conventional machine learning performs poorly on these datasets, while the methods based on deep learning often lack interpretability,...

Descripción completa

Detalles Bibliográficos
Autores principales: Xiao, Li-Ming, Wan, Yun-Qi, Jiang, Zhen-Ran
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8667445/
https://www.ncbi.nlm.nih.gov/pubmed/34903170
http://dx.doi.org/10.1186/s12859-021-04509-6
_version_ 1784614388421885952
author Xiao, Li-Ming
Wan, Yun-Qi
Jiang, Zhen-Ran
author_facet Xiao, Li-Ming
Wan, Yun-Qi
Jiang, Zhen-Ran
author_sort Xiao, Li-Ming
collection PubMed
description BACKGROUND: More and more Cas9 variants with higher specificity are developed to avoid the off-target effect, which brings a significant volume of experimental data. Conventional machine learning performs poorly on these datasets, while the methods based on deep learning often lack interpretability, which makes researchers have to trade-off accuracy and interpretability. It is necessary to develop a method that can not only match deep learning-based methods in performance but also with good interpretability that can be comparable to conventional machine learning methods. RESULTS: To overcome these problems, we propose an intrinsically interpretable method called AttCRISPR based on deep learning to predict the on-target activity. The advantage of AttCRISPR lies in using the ensemble learning strategy to stack available encoding-based methods and embedding-based methods with strong interpretability. Comparison with the state-of-the-art methods using WT-SpCas9, eSpCas9(1.1), SpCas9-HF1 datasets, AttCRISPR can achieve an average Spearman value of 0.872, 0.867, 0.867, respectively on several public datasets, which is superior to these methods. Furthermore, benefits from two attention modules—one spatial and one temporal, AttCRISPR has good interpretability. Through these modules, we can understand the decisions made by AttCRISPR at both global and local levels without other post hoc explanations techniques. CONCLUSION: With the trained models, we reveal the preference for each position-dependent nucleotide on the sgRNA (short guide RNA) sequence in each dataset at a global level. And at a local level, we prove that the interpretability of AttCRISPR can be used to guide the researchers to design sgRNA with higher activity. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04509-6.
format Online
Article
Text
id pubmed-8667445
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-86674452021-12-13 AttCRISPR: a spacetime interpretable model for prediction of sgRNA on-target activity Xiao, Li-Ming Wan, Yun-Qi Jiang, Zhen-Ran BMC Bioinformatics Research BACKGROUND: More and more Cas9 variants with higher specificity are developed to avoid the off-target effect, which brings a significant volume of experimental data. Conventional machine learning performs poorly on these datasets, while the methods based on deep learning often lack interpretability, which makes researchers have to trade-off accuracy and interpretability. It is necessary to develop a method that can not only match deep learning-based methods in performance but also with good interpretability that can be comparable to conventional machine learning methods. RESULTS: To overcome these problems, we propose an intrinsically interpretable method called AttCRISPR based on deep learning to predict the on-target activity. The advantage of AttCRISPR lies in using the ensemble learning strategy to stack available encoding-based methods and embedding-based methods with strong interpretability. Comparison with the state-of-the-art methods using WT-SpCas9, eSpCas9(1.1), SpCas9-HF1 datasets, AttCRISPR can achieve an average Spearman value of 0.872, 0.867, 0.867, respectively on several public datasets, which is superior to these methods. Furthermore, benefits from two attention modules—one spatial and one temporal, AttCRISPR has good interpretability. Through these modules, we can understand the decisions made by AttCRISPR at both global and local levels without other post hoc explanations techniques. CONCLUSION: With the trained models, we reveal the preference for each position-dependent nucleotide on the sgRNA (short guide RNA) sequence in each dataset at a global level. And at a local level, we prove that the interpretability of AttCRISPR can be used to guide the researchers to design sgRNA with higher activity. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-021-04509-6. BioMed Central 2021-12-13 /pmc/articles/PMC8667445/ /pubmed/34903170 http://dx.doi.org/10.1186/s12859-021-04509-6 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Xiao, Li-Ming
Wan, Yun-Qi
Jiang, Zhen-Ran
AttCRISPR: a spacetime interpretable model for prediction of sgRNA on-target activity
title AttCRISPR: a spacetime interpretable model for prediction of sgRNA on-target activity
title_full AttCRISPR: a spacetime interpretable model for prediction of sgRNA on-target activity
title_fullStr AttCRISPR: a spacetime interpretable model for prediction of sgRNA on-target activity
title_full_unstemmed AttCRISPR: a spacetime interpretable model for prediction of sgRNA on-target activity
title_short AttCRISPR: a spacetime interpretable model for prediction of sgRNA on-target activity
title_sort attcrispr: a spacetime interpretable model for prediction of sgrna on-target activity
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8667445/
https://www.ncbi.nlm.nih.gov/pubmed/34903170
http://dx.doi.org/10.1186/s12859-021-04509-6
work_keys_str_mv AT xiaoliming attcrispraspacetimeinterpretablemodelforpredictionofsgrnaontargetactivity
AT wanyunqi attcrispraspacetimeinterpretablemodelforpredictionofsgrnaontargetactivity
AT jiangzhenran attcrispraspacetimeinterpretablemodelforpredictionofsgrnaontargetactivity