Cargando…

Machine learning in the estimation of CRISPR-Cas9 cleavage sites for plant system

CRISPR-Cas9 system is one of the recent most used genome editing techniques. Despite having a high capacity to alter the precise target genes and genomic regions that the planned guide RNA (or sgRNA) complements, the off-target effect still exists. But there are already machine learning algorithms f...

Descripción completa

Detalles Bibliográficos
Autores principales: Das, Jutan, Kumar, Sanjeev, Mishra, Dwijesh Chandra, Chaturvedi, Krishna Kumar, Paul, Ranjit Kumar, Kairi, Amit
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9868961/
https://www.ncbi.nlm.nih.gov/pubmed/36699447
http://dx.doi.org/10.3389/fgene.2022.1085332
_version_ 1784876663027269632
author Das, Jutan
Kumar, Sanjeev
Mishra, Dwijesh Chandra
Chaturvedi, Krishna Kumar
Paul, Ranjit Kumar
Kairi, Amit
author_facet Das, Jutan
Kumar, Sanjeev
Mishra, Dwijesh Chandra
Chaturvedi, Krishna Kumar
Paul, Ranjit Kumar
Kairi, Amit
author_sort Das, Jutan
collection PubMed
description CRISPR-Cas9 system is one of the recent most used genome editing techniques. Despite having a high capacity to alter the precise target genes and genomic regions that the planned guide RNA (or sgRNA) complements, the off-target effect still exists. But there are already machine learning algorithms for people, animals, and a few plant species. In this paper, an effort has been made to create models based on three machine learning-based techniques [namely, artificial neural networks (ANN), support vector machines (SVM), and random forests (RF)] for the prediction of the CRISPR-Cas9 cleavage sites that will be cleaved by a particular sgRNA. The plant dataset was the sole source of inspiration for all of these machine learning-based algorithms. 70% of the on-target and off-target dataset of various plant species that was gathered was used to train the models. The remaining 30% of the data set was used to evaluate the model’s performance using a variety of evaluation metrics, including specificity, sensitivity, accuracy, precision, F1 score, F2 score, and AUC. Based on the aforementioned machine learning techniques, eleven models in all were developed. Comparative analysis of these produced models suggests that the model based on the random forest technique performs better. The accuracy of the Random Forest model is 96.27%, while the AUC value was found to be 99.21%. The SVM-Linear, SVM-Polynomial, SVM-Gaussian, and SVM-Sigmoid models were trained, making a total of six ANN-based models (ANN1-Logistic, ANN1-Tanh, ANN1-ReLU, ANN2-Logistic, ANN2-Tanh, and ANN-ReLU) and Support Vector Machine models (SVM-Linear, SVM-Polynomial, SVM-Gaussian However, the overall performance of Random Forest is better among all other ML techniques. ANN1-ReLU and SVM-Linear model performance were shown to be better among Artificial Neural Network and Support Vector Machine-based models, respectively.
format Online
Article
Text
id pubmed-9868961
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-98689612023-01-24 Machine learning in the estimation of CRISPR-Cas9 cleavage sites for plant system Das, Jutan Kumar, Sanjeev Mishra, Dwijesh Chandra Chaturvedi, Krishna Kumar Paul, Ranjit Kumar Kairi, Amit Front Genet Genetics CRISPR-Cas9 system is one of the recent most used genome editing techniques. Despite having a high capacity to alter the precise target genes and genomic regions that the planned guide RNA (or sgRNA) complements, the off-target effect still exists. But there are already machine learning algorithms for people, animals, and a few plant species. In this paper, an effort has been made to create models based on three machine learning-based techniques [namely, artificial neural networks (ANN), support vector machines (SVM), and random forests (RF)] for the prediction of the CRISPR-Cas9 cleavage sites that will be cleaved by a particular sgRNA. The plant dataset was the sole source of inspiration for all of these machine learning-based algorithms. 70% of the on-target and off-target dataset of various plant species that was gathered was used to train the models. The remaining 30% of the data set was used to evaluate the model’s performance using a variety of evaluation metrics, including specificity, sensitivity, accuracy, precision, F1 score, F2 score, and AUC. Based on the aforementioned machine learning techniques, eleven models in all were developed. Comparative analysis of these produced models suggests that the model based on the random forest technique performs better. The accuracy of the Random Forest model is 96.27%, while the AUC value was found to be 99.21%. The SVM-Linear, SVM-Polynomial, SVM-Gaussian, and SVM-Sigmoid models were trained, making a total of six ANN-based models (ANN1-Logistic, ANN1-Tanh, ANN1-ReLU, ANN2-Logistic, ANN2-Tanh, and ANN-ReLU) and Support Vector Machine models (SVM-Linear, SVM-Polynomial, SVM-Gaussian However, the overall performance of Random Forest is better among all other ML techniques. ANN1-ReLU and SVM-Linear model performance were shown to be better among Artificial Neural Network and Support Vector Machine-based models, respectively. Frontiers Media S.A. 2023-01-09 /pmc/articles/PMC9868961/ /pubmed/36699447 http://dx.doi.org/10.3389/fgene.2022.1085332 Text en Copyright © 2023 Das, Kumar, Mishra, Chaturvedi, Paul and Kairi. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Das, Jutan
Kumar, Sanjeev
Mishra, Dwijesh Chandra
Chaturvedi, Krishna Kumar
Paul, Ranjit Kumar
Kairi, Amit
Machine learning in the estimation of CRISPR-Cas9 cleavage sites for plant system
title Machine learning in the estimation of CRISPR-Cas9 cleavage sites for plant system
title_full Machine learning in the estimation of CRISPR-Cas9 cleavage sites for plant system
title_fullStr Machine learning in the estimation of CRISPR-Cas9 cleavage sites for plant system
title_full_unstemmed Machine learning in the estimation of CRISPR-Cas9 cleavage sites for plant system
title_short Machine learning in the estimation of CRISPR-Cas9 cleavage sites for plant system
title_sort machine learning in the estimation of crispr-cas9 cleavage sites for plant system
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9868961/
https://www.ncbi.nlm.nih.gov/pubmed/36699447
http://dx.doi.org/10.3389/fgene.2022.1085332
work_keys_str_mv AT dasjutan machinelearningintheestimationofcrisprcas9cleavagesitesforplantsystem
AT kumarsanjeev machinelearningintheestimationofcrisprcas9cleavagesitesforplantsystem
AT mishradwijeshchandra machinelearningintheestimationofcrisprcas9cleavagesitesforplantsystem
AT chaturvedikrishnakumar machinelearningintheestimationofcrisprcas9cleavagesitesforplantsystem
AT paulranjitkumar machinelearningintheestimationofcrisprcas9cleavagesitesforplantsystem
AT kairiamit machinelearningintheestimationofcrisprcas9cleavagesitesforplantsystem