Cargando…

A Machine Learning Approach to Identify the Importance of Novel Features for CRISPR/Cas9 Activity Prediction

The reprogrammable CRISPR/Cas9 genome editing tool’s growing popularity is hindered by unwanted off-target effects. Efforts have been directed toward designing efficient guide RNAs as well as identifying potential off-target threats, yet factors that determine efficiency and off-target activity rema...

Descripción completa

Detalles Bibliográficos
Autores principales: Vora, Dhvani Sandip, Verma, Yugesh, Sundar, Durai
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9405635/
https://www.ncbi.nlm.nih.gov/pubmed/36009017
http://dx.doi.org/10.3390/biom12081123
_version_ 1784773926203686912
author Vora, Dhvani Sandip
Verma, Yugesh
Sundar, Durai
author_facet Vora, Dhvani Sandip
Verma, Yugesh
Sundar, Durai
author_sort Vora, Dhvani Sandip
collection PubMed
description The reprogrammable CRISPR/Cas9 genome editing tool’s growing popularity is hindered by unwanted off-target effects. Efforts have been directed toward designing efficient guide RNAs as well as identifying potential off-target threats, yet factors that determine efficiency and off-target activity remain obscure. Based on sequence features, previous machine learning models performed poorly on new datasets, thus there is a need for the incorporation of novel features. The binding energy estimation of the gRNA-DNA hybrid as well as the Cas9-gRNA-DNA hybrid allowed generating better performing machine learning models for the prediction of Cas9 activity. The analysis of feature contribution towards the model output on a limited dataset indicated that energy features played a determining role along with the sequence features. The binding energy features proved essential for the prediction of on-target activity and off-target sites. The plateau, in the performance on unseen datasets, of current machine learning models could be overcome by incorporating novel features, such as binding energy, among others. The models are provided on GitHub (GitHub Inc., San Francisco, CA, USA).
format Online
Article
Text
id pubmed-9405635
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-94056352022-08-26 A Machine Learning Approach to Identify the Importance of Novel Features for CRISPR/Cas9 Activity Prediction Vora, Dhvani Sandip Verma, Yugesh Sundar, Durai Biomolecules Article The reprogrammable CRISPR/Cas9 genome editing tool’s growing popularity is hindered by unwanted off-target effects. Efforts have been directed toward designing efficient guide RNAs as well as identifying potential off-target threats, yet factors that determine efficiency and off-target activity remain obscure. Based on sequence features, previous machine learning models performed poorly on new datasets, thus there is a need for the incorporation of novel features. The binding energy estimation of the gRNA-DNA hybrid as well as the Cas9-gRNA-DNA hybrid allowed generating better performing machine learning models for the prediction of Cas9 activity. The analysis of feature contribution towards the model output on a limited dataset indicated that energy features played a determining role along with the sequence features. The binding energy features proved essential for the prediction of on-target activity and off-target sites. The plateau, in the performance on unseen datasets, of current machine learning models could be overcome by incorporating novel features, such as binding energy, among others. The models are provided on GitHub (GitHub Inc., San Francisco, CA, USA). MDPI 2022-08-16 /pmc/articles/PMC9405635/ /pubmed/36009017 http://dx.doi.org/10.3390/biom12081123 Text en © 2022 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Vora, Dhvani Sandip
Verma, Yugesh
Sundar, Durai
A Machine Learning Approach to Identify the Importance of Novel Features for CRISPR/Cas9 Activity Prediction
title A Machine Learning Approach to Identify the Importance of Novel Features for CRISPR/Cas9 Activity Prediction
title_full A Machine Learning Approach to Identify the Importance of Novel Features for CRISPR/Cas9 Activity Prediction
title_fullStr A Machine Learning Approach to Identify the Importance of Novel Features for CRISPR/Cas9 Activity Prediction
title_full_unstemmed A Machine Learning Approach to Identify the Importance of Novel Features for CRISPR/Cas9 Activity Prediction
title_short A Machine Learning Approach to Identify the Importance of Novel Features for CRISPR/Cas9 Activity Prediction
title_sort machine learning approach to identify the importance of novel features for crispr/cas9 activity prediction
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9405635/
https://www.ncbi.nlm.nih.gov/pubmed/36009017
http://dx.doi.org/10.3390/biom12081123
work_keys_str_mv AT voradhvanisandip amachinelearningapproachtoidentifytheimportanceofnovelfeaturesforcrisprcas9activityprediction
AT vermayugesh amachinelearningapproachtoidentifytheimportanceofnovelfeaturesforcrisprcas9activityprediction
AT sundardurai amachinelearningapproachtoidentifytheimportanceofnovelfeaturesforcrisprcas9activityprediction
AT voradhvanisandip machinelearningapproachtoidentifytheimportanceofnovelfeaturesforcrisprcas9activityprediction
AT vermayugesh machinelearningapproachtoidentifytheimportanceofnovelfeaturesforcrisprcas9activityprediction
AT sundardurai machinelearningapproachtoidentifytheimportanceofnovelfeaturesforcrisprcas9activityprediction