Cargando…

CGD: Comprehensive guide designer for CRISPR-Cas systems

The Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)-Cas systems, including dead Cas9 (dCas9), Cas9, and Cas12a, have revolutionized genome engineering in mammalian somatic cells. Although computational tools that assess the target sites of CRISPR-Cas systems are inevitably importa...

Descripción completa

Detalles Bibliográficos
Autores principales: Menon, A Vipin, Sohn, Jang-il, Nam, Jin-Wu
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Research Network of Computational and Structural Biotechnology 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7152703/
https://www.ncbi.nlm.nih.gov/pubmed/32308928
http://dx.doi.org/10.1016/j.csbj.2020.03.020
_version_ 1783521535694209024
author Menon, A Vipin
Sohn, Jang-il
Nam, Jin-Wu
author_facet Menon, A Vipin
Sohn, Jang-il
Nam, Jin-Wu
author_sort Menon, A Vipin
collection PubMed
description The Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)-Cas systems, including dead Cas9 (dCas9), Cas9, and Cas12a, have revolutionized genome engineering in mammalian somatic cells. Although computational tools that assess the target sites of CRISPR-Cas systems are inevitably important for designing efficient guide RNAs (gRNAs), they exhibit generalization issues in selecting features and do not provide optimal results in a comprehensive manner. Here, we introduce a Comprehensive Guide Designer (CGD) for four different CRISPR systems, which utilizes the machine learning algorithm, Elastic Net Logistic Regression (ENLOR), to autonomously generalize the models. CGD contains specific models trained with public datasets generated by CRISPRi, CRISPRa, CRISPR-Cas9, and CRISPR-Cas12a (designated as CGDi, CGDa, CGD9, and CGD12a, respectively) in an unbiased manner. The trained CGD models were benchmarked to other regression-based machine learning models, such as ElasticNet Linear Regression (ENLR), Random Forest and Boruta (RFB), and Extreme Gradient Boosting (Xgboost) with inbuilt feature selection. Evaluation with independent test datasets showed that CGD models outperformed the pre-existing methods in predicting the efficacy of gRNAs. All CGD source codes and datasets are available at GitHub (https://github.com/vipinmenon1989/CGD), and the CGD webserver can be accessed at http://big.hanyang.ac.kr:2195/CGD.
format Online
Article
Text
id pubmed-7152703
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Research Network of Computational and Structural Biotechnology
record_format MEDLINE/PubMed
spelling pubmed-71527032020-04-17 CGD: Comprehensive guide designer for CRISPR-Cas systems Menon, A Vipin Sohn, Jang-il Nam, Jin-Wu Comput Struct Biotechnol J Research Article The Clustered Regularly Interspaced Short Palindromic Repeats (CRISPR)-Cas systems, including dead Cas9 (dCas9), Cas9, and Cas12a, have revolutionized genome engineering in mammalian somatic cells. Although computational tools that assess the target sites of CRISPR-Cas systems are inevitably important for designing efficient guide RNAs (gRNAs), they exhibit generalization issues in selecting features and do not provide optimal results in a comprehensive manner. Here, we introduce a Comprehensive Guide Designer (CGD) for four different CRISPR systems, which utilizes the machine learning algorithm, Elastic Net Logistic Regression (ENLOR), to autonomously generalize the models. CGD contains specific models trained with public datasets generated by CRISPRi, CRISPRa, CRISPR-Cas9, and CRISPR-Cas12a (designated as CGDi, CGDa, CGD9, and CGD12a, respectively) in an unbiased manner. The trained CGD models were benchmarked to other regression-based machine learning models, such as ElasticNet Linear Regression (ENLR), Random Forest and Boruta (RFB), and Extreme Gradient Boosting (Xgboost) with inbuilt feature selection. Evaluation with independent test datasets showed that CGD models outperformed the pre-existing methods in predicting the efficacy of gRNAs. All CGD source codes and datasets are available at GitHub (https://github.com/vipinmenon1989/CGD), and the CGD webserver can be accessed at http://big.hanyang.ac.kr:2195/CGD. Research Network of Computational and Structural Biotechnology 2020-03-25 /pmc/articles/PMC7152703/ /pubmed/32308928 http://dx.doi.org/10.1016/j.csbj.2020.03.020 Text en © 2020 The Authors http://creativecommons.org/licenses/by-nc-nd/4.0/ This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Research Article
Menon, A Vipin
Sohn, Jang-il
Nam, Jin-Wu
CGD: Comprehensive guide designer for CRISPR-Cas systems
title CGD: Comprehensive guide designer for CRISPR-Cas systems
title_full CGD: Comprehensive guide designer for CRISPR-Cas systems
title_fullStr CGD: Comprehensive guide designer for CRISPR-Cas systems
title_full_unstemmed CGD: Comprehensive guide designer for CRISPR-Cas systems
title_short CGD: Comprehensive guide designer for CRISPR-Cas systems
title_sort cgd: comprehensive guide designer for crispr-cas systems
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7152703/
https://www.ncbi.nlm.nih.gov/pubmed/32308928
http://dx.doi.org/10.1016/j.csbj.2020.03.020
work_keys_str_mv AT menonavipin cgdcomprehensiveguidedesignerforcrisprcassystems
AT sohnjangil cgdcomprehensiveguidedesignerforcrisprcassystems
AT namjinwu cgdcomprehensiveguidedesignerforcrisprcassystems