Cargando…

Hybrid Multitask Learning Reveals Sequence Features Driving Specificity in the CRISPR/Cas9 System

CRISPR/Cas9 technology is capable of precisely editing genomes and is at the heart of various scientific and medical advances in recent times. The advances in biomedical research are hindered because of the inadvertent burden on the genome when genome editors are employed—the off-target effects. Alt...

Descripción completa

Detalles Bibliográficos
Autores principales: Vora, Dhvani Sandip, Yadav, Shashank, Sundar, Durai
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10135716/
https://www.ncbi.nlm.nih.gov/pubmed/37189388
http://dx.doi.org/10.3390/biom13040641
_version_ 1785032045719715840
author Vora, Dhvani Sandip
Yadav, Shashank
Sundar, Durai
author_facet Vora, Dhvani Sandip
Yadav, Shashank
Sundar, Durai
author_sort Vora, Dhvani Sandip
collection PubMed
description CRISPR/Cas9 technology is capable of precisely editing genomes and is at the heart of various scientific and medical advances in recent times. The advances in biomedical research are hindered because of the inadvertent burden on the genome when genome editors are employed—the off-target effects. Although experimental screens to detect off-targets have allowed understanding the activity of Cas9, that knowledge remains incomplete as the rules do not extrapolate well to new target sequences. Off-target prediction tools developed recently have increasingly relied on machine learning and deep learning techniques to reliably understand the complete threat of likely off-targets because the rules that drive Cas9 activity are not fully understood. In this study, we present a count-based as well as deep-learning-based approach to derive sequence features that are important in deciding on Cas9 activity at a sequence. There are two major challenges in off-target determination—the identification of a likely site of Cas9 activity and the prediction of the extent of Cas9 activity at that site. The hybrid multitask CNN–biLSTM model developed, named CRISP–RCNN, simultaneously predicts off-targets and the extent of activity on off-targets. Employing methods of integrated gradients and weighting kernels for feature importance approximation, analysis of nucleotide and position preference, and mismatch tolerance have been performed.
format Online
Article
Text
id pubmed-10135716
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-101357162023-04-28 Hybrid Multitask Learning Reveals Sequence Features Driving Specificity in the CRISPR/Cas9 System Vora, Dhvani Sandip Yadav, Shashank Sundar, Durai Biomolecules Article CRISPR/Cas9 technology is capable of precisely editing genomes and is at the heart of various scientific and medical advances in recent times. The advances in biomedical research are hindered because of the inadvertent burden on the genome when genome editors are employed—the off-target effects. Although experimental screens to detect off-targets have allowed understanding the activity of Cas9, that knowledge remains incomplete as the rules do not extrapolate well to new target sequences. Off-target prediction tools developed recently have increasingly relied on machine learning and deep learning techniques to reliably understand the complete threat of likely off-targets because the rules that drive Cas9 activity are not fully understood. In this study, we present a count-based as well as deep-learning-based approach to derive sequence features that are important in deciding on Cas9 activity at a sequence. There are two major challenges in off-target determination—the identification of a likely site of Cas9 activity and the prediction of the extent of Cas9 activity at that site. The hybrid multitask CNN–biLSTM model developed, named CRISP–RCNN, simultaneously predicts off-targets and the extent of activity on off-targets. Employing methods of integrated gradients and weighting kernels for feature importance approximation, analysis of nucleotide and position preference, and mismatch tolerance have been performed. MDPI 2023-04-03 /pmc/articles/PMC10135716/ /pubmed/37189388 http://dx.doi.org/10.3390/biom13040641 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Vora, Dhvani Sandip
Yadav, Shashank
Sundar, Durai
Hybrid Multitask Learning Reveals Sequence Features Driving Specificity in the CRISPR/Cas9 System
title Hybrid Multitask Learning Reveals Sequence Features Driving Specificity in the CRISPR/Cas9 System
title_full Hybrid Multitask Learning Reveals Sequence Features Driving Specificity in the CRISPR/Cas9 System
title_fullStr Hybrid Multitask Learning Reveals Sequence Features Driving Specificity in the CRISPR/Cas9 System
title_full_unstemmed Hybrid Multitask Learning Reveals Sequence Features Driving Specificity in the CRISPR/Cas9 System
title_short Hybrid Multitask Learning Reveals Sequence Features Driving Specificity in the CRISPR/Cas9 System
title_sort hybrid multitask learning reveals sequence features driving specificity in the crispr/cas9 system
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10135716/
https://www.ncbi.nlm.nih.gov/pubmed/37189388
http://dx.doi.org/10.3390/biom13040641
work_keys_str_mv AT voradhvanisandip hybridmultitasklearningrevealssequencefeaturesdrivingspecificityinthecrisprcas9system
AT yadavshashank hybridmultitasklearningrevealssequencefeaturesdrivingspecificityinthecrisprcas9system
AT sundardurai hybridmultitasklearningrevealssequencefeaturesdrivingspecificityinthecrisprcas9system