Cargando…

Machine learning in the estimation of CRISPR-Cas9 cleavage sites for plant system

CRISPR-Cas9 system is one of the recent most used genome editing techniques. Despite having a high capacity to alter the precise target genes and genomic regions that the planned guide RNA (or sgRNA) complements, the off-target effect still exists. But there are already machine learning algorithms f...

Descripción completa

Detalles Bibliográficos
Autores principales:	Das, Jutan, Kumar, Sanjeev, Mishra, Dwijesh Chandra, Chaturvedi, Krishna Kumar, Paul, Ranjit Kumar, Kairi, Amit
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2023
Materias:	Genetics
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9868961/ https://www.ncbi.nlm.nih.gov/pubmed/36699447 http://dx.doi.org/10.3389/fgene.2022.1085332

_version_	1784876663027269632
author	Das, Jutan Kumar, Sanjeev Mishra, Dwijesh Chandra Chaturvedi, Krishna Kumar Paul, Ranjit Kumar Kairi, Amit
author_facet	Das, Jutan Kumar, Sanjeev Mishra, Dwijesh Chandra Chaturvedi, Krishna Kumar Paul, Ranjit Kumar Kairi, Amit
author_sort	Das, Jutan
collection	PubMed
description	CRISPR-Cas9 system is one of the recent most used genome editing techniques. Despite having a high capacity to alter the precise target genes and genomic regions that the planned guide RNA (or sgRNA) complements, the off-target effect still exists. But there are already machine learning algorithms for people, animals, and a few plant species. In this paper, an effort has been made to create models based on three machine learning-based techniques [namely, artificial neural networks (ANN), support vector machines (SVM), and random forests (RF)] for the prediction of the CRISPR-Cas9 cleavage sites that will be cleaved by a particular sgRNA. The plant dataset was the sole source of inspiration for all of these machine learning-based algorithms. 70% of the on-target and off-target dataset of various plant species that was gathered was used to train the models. The remaining 30% of the data set was used to evaluate the model’s performance using a variety of evaluation metrics, including specificity, sensitivity, accuracy, precision, F1 score, F2 score, and AUC. Based on the aforementioned machine learning techniques, eleven models in all were developed. Comparative analysis of these produced models suggests that the model based on the random forest technique performs better. The accuracy of the Random Forest model is 96.27%, while the AUC value was found to be 99.21%. The SVM-Linear, SVM-Polynomial, SVM-Gaussian, and SVM-Sigmoid models were trained, making a total of six ANN-based models (ANN1-Logistic, ANN1-Tanh, ANN1-ReLU, ANN2-Logistic, ANN2-Tanh, and ANN-ReLU) and Support Vector Machine models (SVM-Linear, SVM-Polynomial, SVM-Gaussian However, the overall performance of Random Forest is better among all other ML techniques. ANN1-ReLU and SVM-Linear model performance were shown to be better among Artificial Neural Network and Support Vector Machine-based models, respectively.
format	Online Article Text
id	pubmed-9868961
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-98689612023-01-24 Machine learning in the estimation of CRISPR-Cas9 cleavage sites for plant system Das, Jutan Kumar, Sanjeev Mishra, Dwijesh Chandra Chaturvedi, Krishna Kumar Paul, Ranjit Kumar Kairi, Amit Front Genet Genetics CRISPR-Cas9 system is one of the recent most used genome editing techniques. Despite having a high capacity to alter the precise target genes and genomic regions that the planned guide RNA (or sgRNA) complements, the off-target effect still exists. But there are already machine learning algorithms for people, animals, and a few plant species. In this paper, an effort has been made to create models based on three machine learning-based techniques [namely, artificial neural networks (ANN), support vector machines (SVM), and random forests (RF)] for the prediction of the CRISPR-Cas9 cleavage sites that will be cleaved by a particular sgRNA. The plant dataset was the sole source of inspiration for all of these machine learning-based algorithms. 70% of the on-target and off-target dataset of various plant species that was gathered was used to train the models. The remaining 30% of the data set was used to evaluate the model’s performance using a variety of evaluation metrics, including specificity, sensitivity, accuracy, precision, F1 score, F2 score, and AUC. Based on the aforementioned machine learning techniques, eleven models in all were developed. Comparative analysis of these produced models suggests that the model based on the random forest technique performs better. The accuracy of the Random Forest model is 96.27%, while the AUC value was found to be 99.21%. The SVM-Linear, SVM-Polynomial, SVM-Gaussian, and SVM-Sigmoid models were trained, making a total of six ANN-based models (ANN1-Logistic, ANN1-Tanh, ANN1-ReLU, ANN2-Logistic, ANN2-Tanh, and ANN-ReLU) and Support Vector Machine models (SVM-Linear, SVM-Polynomial, SVM-Gaussian However, the overall performance of Random Forest is better among all other ML techniques. ANN1-ReLU and SVM-Linear model performance were shown to be better among Artificial Neural Network and Support Vector Machine-based models, respectively. Frontiers Media S.A. 2023-01-09 /pmc/articles/PMC9868961/ /pubmed/36699447 http://dx.doi.org/10.3389/fgene.2022.1085332 Text en Copyright © 2023 Das, Kumar, Mishra, Chaturvedi, Paul and Kairi. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Genetics Das, Jutan Kumar, Sanjeev Mishra, Dwijesh Chandra Chaturvedi, Krishna Kumar Paul, Ranjit Kumar Kairi, Amit Machine learning in the estimation of CRISPR-Cas9 cleavage sites for plant system
title	Machine learning in the estimation of CRISPR-Cas9 cleavage sites for plant system
title_full	Machine learning in the estimation of CRISPR-Cas9 cleavage sites for plant system
title_fullStr	Machine learning in the estimation of CRISPR-Cas9 cleavage sites for plant system
title_full_unstemmed	Machine learning in the estimation of CRISPR-Cas9 cleavage sites for plant system
title_short	Machine learning in the estimation of CRISPR-Cas9 cleavage sites for plant system
title_sort	machine learning in the estimation of crispr-cas9 cleavage sites for plant system
topic	Genetics
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9868961/ https://www.ncbi.nlm.nih.gov/pubmed/36699447 http://dx.doi.org/10.3389/fgene.2022.1085332
work_keys_str_mv	AT dasjutan machinelearningintheestimationofcrisprcas9cleavagesitesforplantsystem AT kumarsanjeev machinelearningintheestimationofcrisprcas9cleavagesitesforplantsystem AT mishradwijeshchandra machinelearningintheestimationofcrisprcas9cleavagesitesforplantsystem AT chaturvedikrishnakumar machinelearningintheestimationofcrisprcas9cleavagesitesforplantsystem AT paulranjitkumar machinelearningintheestimationofcrisprcas9cleavagesitesforplantsystem AT kairiamit machinelearningintheestimationofcrisprcas9cleavagesitesforplantsystem

Machine learning in the estimation of CRISPR-Cas9 cleavage sites for plant system

Ejemplares similares