Cargando…
Machine learning in the estimation of CRISPR-Cas9 cleavage sites for plant system
CRISPR-Cas9 system is one of the recent most used genome editing techniques. Despite having a high capacity to alter the precise target genes and genomic regions that the planned guide RNA (or sgRNA) complements, the off-target effect still exists. But there are already machine learning algorithms f...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9868961/ https://www.ncbi.nlm.nih.gov/pubmed/36699447 http://dx.doi.org/10.3389/fgene.2022.1085332 |
_version_ | 1784876663027269632 |
---|---|
author | Das, Jutan Kumar, Sanjeev Mishra, Dwijesh Chandra Chaturvedi, Krishna Kumar Paul, Ranjit Kumar Kairi, Amit |
author_facet | Das, Jutan Kumar, Sanjeev Mishra, Dwijesh Chandra Chaturvedi, Krishna Kumar Paul, Ranjit Kumar Kairi, Amit |
author_sort | Das, Jutan |
collection | PubMed |
description | CRISPR-Cas9 system is one of the recent most used genome editing techniques. Despite having a high capacity to alter the precise target genes and genomic regions that the planned guide RNA (or sgRNA) complements, the off-target effect still exists. But there are already machine learning algorithms for people, animals, and a few plant species. In this paper, an effort has been made to create models based on three machine learning-based techniques [namely, artificial neural networks (ANN), support vector machines (SVM), and random forests (RF)] for the prediction of the CRISPR-Cas9 cleavage sites that will be cleaved by a particular sgRNA. The plant dataset was the sole source of inspiration for all of these machine learning-based algorithms. 70% of the on-target and off-target dataset of various plant species that was gathered was used to train the models. The remaining 30% of the data set was used to evaluate the model’s performance using a variety of evaluation metrics, including specificity, sensitivity, accuracy, precision, F1 score, F2 score, and AUC. Based on the aforementioned machine learning techniques, eleven models in all were developed. Comparative analysis of these produced models suggests that the model based on the random forest technique performs better. The accuracy of the Random Forest model is 96.27%, while the AUC value was found to be 99.21%. The SVM-Linear, SVM-Polynomial, SVM-Gaussian, and SVM-Sigmoid models were trained, making a total of six ANN-based models (ANN1-Logistic, ANN1-Tanh, ANN1-ReLU, ANN2-Logistic, ANN2-Tanh, and ANN-ReLU) and Support Vector Machine models (SVM-Linear, SVM-Polynomial, SVM-Gaussian However, the overall performance of Random Forest is better among all other ML techniques. ANN1-ReLU and SVM-Linear model performance were shown to be better among Artificial Neural Network and Support Vector Machine-based models, respectively. |
format | Online Article Text |
id | pubmed-9868961 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-98689612023-01-24 Machine learning in the estimation of CRISPR-Cas9 cleavage sites for plant system Das, Jutan Kumar, Sanjeev Mishra, Dwijesh Chandra Chaturvedi, Krishna Kumar Paul, Ranjit Kumar Kairi, Amit Front Genet Genetics CRISPR-Cas9 system is one of the recent most used genome editing techniques. Despite having a high capacity to alter the precise target genes and genomic regions that the planned guide RNA (or sgRNA) complements, the off-target effect still exists. But there are already machine learning algorithms for people, animals, and a few plant species. In this paper, an effort has been made to create models based on three machine learning-based techniques [namely, artificial neural networks (ANN), support vector machines (SVM), and random forests (RF)] for the prediction of the CRISPR-Cas9 cleavage sites that will be cleaved by a particular sgRNA. The plant dataset was the sole source of inspiration for all of these machine learning-based algorithms. 70% of the on-target and off-target dataset of various plant species that was gathered was used to train the models. The remaining 30% of the data set was used to evaluate the model’s performance using a variety of evaluation metrics, including specificity, sensitivity, accuracy, precision, F1 score, F2 score, and AUC. Based on the aforementioned machine learning techniques, eleven models in all were developed. Comparative analysis of these produced models suggests that the model based on the random forest technique performs better. The accuracy of the Random Forest model is 96.27%, while the AUC value was found to be 99.21%. The SVM-Linear, SVM-Polynomial, SVM-Gaussian, and SVM-Sigmoid models were trained, making a total of six ANN-based models (ANN1-Logistic, ANN1-Tanh, ANN1-ReLU, ANN2-Logistic, ANN2-Tanh, and ANN-ReLU) and Support Vector Machine models (SVM-Linear, SVM-Polynomial, SVM-Gaussian However, the overall performance of Random Forest is better among all other ML techniques. ANN1-ReLU and SVM-Linear model performance were shown to be better among Artificial Neural Network and Support Vector Machine-based models, respectively. Frontiers Media S.A. 2023-01-09 /pmc/articles/PMC9868961/ /pubmed/36699447 http://dx.doi.org/10.3389/fgene.2022.1085332 Text en Copyright © 2023 Das, Kumar, Mishra, Chaturvedi, Paul and Kairi. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Genetics Das, Jutan Kumar, Sanjeev Mishra, Dwijesh Chandra Chaturvedi, Krishna Kumar Paul, Ranjit Kumar Kairi, Amit Machine learning in the estimation of CRISPR-Cas9 cleavage sites for plant system |
title | Machine learning in the estimation of CRISPR-Cas9 cleavage sites for plant system |
title_full | Machine learning in the estimation of CRISPR-Cas9 cleavage sites for plant system |
title_fullStr | Machine learning in the estimation of CRISPR-Cas9 cleavage sites for plant system |
title_full_unstemmed | Machine learning in the estimation of CRISPR-Cas9 cleavage sites for plant system |
title_short | Machine learning in the estimation of CRISPR-Cas9 cleavage sites for plant system |
title_sort | machine learning in the estimation of crispr-cas9 cleavage sites for plant system |
topic | Genetics |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9868961/ https://www.ncbi.nlm.nih.gov/pubmed/36699447 http://dx.doi.org/10.3389/fgene.2022.1085332 |
work_keys_str_mv | AT dasjutan machinelearningintheestimationofcrisprcas9cleavagesitesforplantsystem AT kumarsanjeev machinelearningintheestimationofcrisprcas9cleavagesitesforplantsystem AT mishradwijeshchandra machinelearningintheestimationofcrisprcas9cleavagesitesforplantsystem AT chaturvedikrishnakumar machinelearningintheestimationofcrisprcas9cleavagesitesforplantsystem AT paulranjitkumar machinelearningintheestimationofcrisprcas9cleavagesitesforplantsystem AT kairiamit machinelearningintheestimationofcrisprcas9cleavagesitesforplantsystem |