Cargando…
CRISPRpred(SEQ): a sequence-based method for sgRNA on target activity prediction using traditional machine learning
BACKGROUND: The latest works on CRISPR genome editing tools mainly employs deep learning techniques. However, deep learning models lack explainability and they are harder to reproduce. We were motivated to build an accurate genome editing tool using sequence-based features and traditional machine le...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7268231/ https://www.ncbi.nlm.nih.gov/pubmed/32487025 http://dx.doi.org/10.1186/s12859-020-3531-9 |
_version_ | 1783541571131539456 |
---|---|
author | Muhammad Rafid, Ali Haisam Toufikuzzaman, Md. Rahman, Mohammad Saifur Rahman, M. Sohel |
author_facet | Muhammad Rafid, Ali Haisam Toufikuzzaman, Md. Rahman, Mohammad Saifur Rahman, M. Sohel |
author_sort | Muhammad Rafid, Ali Haisam |
collection | PubMed |
description | BACKGROUND: The latest works on CRISPR genome editing tools mainly employs deep learning techniques. However, deep learning models lack explainability and they are harder to reproduce. We were motivated to build an accurate genome editing tool using sequence-based features and traditional machine learning that can compete with deep learning models. RESULTS: In this paper, we present CRISPRpred(SEQ), a method for sgRNA on-target activity prediction that leverages only traditional machine learning techniques and hand-crafted features extracted from sgRNA sequences. We compare the results of CRISPRpred(SEQ) with that of DeepCRISPR, the current state-of-the-art, which uses a deep learning pipeline. Despite using only traditional machine learning methods, we have been able to beat DeepCRISPR for the three out of four cell lines in the benchmark dataset convincingly (2.174%, 6.905% and 8.119% improvement for the three cell lines). CONCLUSION: CRISPRpred(SEQ) has been able to convincingly beat DeepCRISPR in 3 out of 4 cell lines. We believe that by exploring further, one can design better features only using the sgRNA sequences and can come up with a better method leveraging only traditional machine learning algorithms that can fully beat the deep learning models. |
format | Online Article Text |
id | pubmed-7268231 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-72682312020-06-07 CRISPRpred(SEQ): a sequence-based method for sgRNA on target activity prediction using traditional machine learning Muhammad Rafid, Ali Haisam Toufikuzzaman, Md. Rahman, Mohammad Saifur Rahman, M. Sohel BMC Bioinformatics Methodology Article BACKGROUND: The latest works on CRISPR genome editing tools mainly employs deep learning techniques. However, deep learning models lack explainability and they are harder to reproduce. We were motivated to build an accurate genome editing tool using sequence-based features and traditional machine learning that can compete with deep learning models. RESULTS: In this paper, we present CRISPRpred(SEQ), a method for sgRNA on-target activity prediction that leverages only traditional machine learning techniques and hand-crafted features extracted from sgRNA sequences. We compare the results of CRISPRpred(SEQ) with that of DeepCRISPR, the current state-of-the-art, which uses a deep learning pipeline. Despite using only traditional machine learning methods, we have been able to beat DeepCRISPR for the three out of four cell lines in the benchmark dataset convincingly (2.174%, 6.905% and 8.119% improvement for the three cell lines). CONCLUSION: CRISPRpred(SEQ) has been able to convincingly beat DeepCRISPR in 3 out of 4 cell lines. We believe that by exploring further, one can design better features only using the sgRNA sequences and can come up with a better method leveraging only traditional machine learning algorithms that can fully beat the deep learning models. BioMed Central 2020-06-01 /pmc/articles/PMC7268231/ /pubmed/32487025 http://dx.doi.org/10.1186/s12859-020-3531-9 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Methodology Article Muhammad Rafid, Ali Haisam Toufikuzzaman, Md. Rahman, Mohammad Saifur Rahman, M. Sohel CRISPRpred(SEQ): a sequence-based method for sgRNA on target activity prediction using traditional machine learning |
title | CRISPRpred(SEQ): a sequence-based method for sgRNA on target activity prediction using traditional machine learning |
title_full | CRISPRpred(SEQ): a sequence-based method for sgRNA on target activity prediction using traditional machine learning |
title_fullStr | CRISPRpred(SEQ): a sequence-based method for sgRNA on target activity prediction using traditional machine learning |
title_full_unstemmed | CRISPRpred(SEQ): a sequence-based method for sgRNA on target activity prediction using traditional machine learning |
title_short | CRISPRpred(SEQ): a sequence-based method for sgRNA on target activity prediction using traditional machine learning |
title_sort | crisprpred(seq): a sequence-based method for sgrna on target activity prediction using traditional machine learning |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7268231/ https://www.ncbi.nlm.nih.gov/pubmed/32487025 http://dx.doi.org/10.1186/s12859-020-3531-9 |
work_keys_str_mv | AT muhammadrafidalihaisam crisprpredseqasequencebasedmethodforsgrnaontargetactivitypredictionusingtraditionalmachinelearning AT toufikuzzamanmd crisprpredseqasequencebasedmethodforsgrnaontargetactivitypredictionusingtraditionalmachinelearning AT rahmanmohammadsaifur crisprpredseqasequencebasedmethodforsgrnaontargetactivitypredictionusingtraditionalmachinelearning AT rahmanmsohel crisprpredseqasequencebasedmethodforsgrnaontargetactivitypredictionusingtraditionalmachinelearning |