Cargando…

CRISPRpred(SEQ): a sequence-based method for sgRNA on target activity prediction using traditional machine learning

BACKGROUND: The latest works on CRISPR genome editing tools mainly employs deep learning techniques. However, deep learning models lack explainability and they are harder to reproduce. We were motivated to build an accurate genome editing tool using sequence-based features and traditional machine le...

Descripción completa

Detalles Bibliográficos
Autores principales: Muhammad Rafid, Ali Haisam, Toufikuzzaman, Md., Rahman, Mohammad Saifur, Rahman, M. Sohel
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7268231/
https://www.ncbi.nlm.nih.gov/pubmed/32487025
http://dx.doi.org/10.1186/s12859-020-3531-9
_version_ 1783541571131539456
author Muhammad Rafid, Ali Haisam
Toufikuzzaman, Md.
Rahman, Mohammad Saifur
Rahman, M. Sohel
author_facet Muhammad Rafid, Ali Haisam
Toufikuzzaman, Md.
Rahman, Mohammad Saifur
Rahman, M. Sohel
author_sort Muhammad Rafid, Ali Haisam
collection PubMed
description BACKGROUND: The latest works on CRISPR genome editing tools mainly employs deep learning techniques. However, deep learning models lack explainability and they are harder to reproduce. We were motivated to build an accurate genome editing tool using sequence-based features and traditional machine learning that can compete with deep learning models. RESULTS: In this paper, we present CRISPRpred(SEQ), a method for sgRNA on-target activity prediction that leverages only traditional machine learning techniques and hand-crafted features extracted from sgRNA sequences. We compare the results of CRISPRpred(SEQ) with that of DeepCRISPR, the current state-of-the-art, which uses a deep learning pipeline. Despite using only traditional machine learning methods, we have been able to beat DeepCRISPR for the three out of four cell lines in the benchmark dataset convincingly (2.174%, 6.905% and 8.119% improvement for the three cell lines). CONCLUSION: CRISPRpred(SEQ) has been able to convincingly beat DeepCRISPR in 3 out of 4 cell lines. We believe that by exploring further, one can design better features only using the sgRNA sequences and can come up with a better method leveraging only traditional machine learning algorithms that can fully beat the deep learning models.
format Online
Article
Text
id pubmed-7268231
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-72682312020-06-07 CRISPRpred(SEQ): a sequence-based method for sgRNA on target activity prediction using traditional machine learning Muhammad Rafid, Ali Haisam Toufikuzzaman, Md. Rahman, Mohammad Saifur Rahman, M. Sohel BMC Bioinformatics Methodology Article BACKGROUND: The latest works on CRISPR genome editing tools mainly employs deep learning techniques. However, deep learning models lack explainability and they are harder to reproduce. We were motivated to build an accurate genome editing tool using sequence-based features and traditional machine learning that can compete with deep learning models. RESULTS: In this paper, we present CRISPRpred(SEQ), a method for sgRNA on-target activity prediction that leverages only traditional machine learning techniques and hand-crafted features extracted from sgRNA sequences. We compare the results of CRISPRpred(SEQ) with that of DeepCRISPR, the current state-of-the-art, which uses a deep learning pipeline. Despite using only traditional machine learning methods, we have been able to beat DeepCRISPR for the three out of four cell lines in the benchmark dataset convincingly (2.174%, 6.905% and 8.119% improvement for the three cell lines). CONCLUSION: CRISPRpred(SEQ) has been able to convincingly beat DeepCRISPR in 3 out of 4 cell lines. We believe that by exploring further, one can design better features only using the sgRNA sequences and can come up with a better method leveraging only traditional machine learning algorithms that can fully beat the deep learning models. BioMed Central 2020-06-01 /pmc/articles/PMC7268231/ /pubmed/32487025 http://dx.doi.org/10.1186/s12859-020-3531-9 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Methodology Article
Muhammad Rafid, Ali Haisam
Toufikuzzaman, Md.
Rahman, Mohammad Saifur
Rahman, M. Sohel
CRISPRpred(SEQ): a sequence-based method for sgRNA on target activity prediction using traditional machine learning
title CRISPRpred(SEQ): a sequence-based method for sgRNA on target activity prediction using traditional machine learning
title_full CRISPRpred(SEQ): a sequence-based method for sgRNA on target activity prediction using traditional machine learning
title_fullStr CRISPRpred(SEQ): a sequence-based method for sgRNA on target activity prediction using traditional machine learning
title_full_unstemmed CRISPRpred(SEQ): a sequence-based method for sgRNA on target activity prediction using traditional machine learning
title_short CRISPRpred(SEQ): a sequence-based method for sgRNA on target activity prediction using traditional machine learning
title_sort crisprpred(seq): a sequence-based method for sgrna on target activity prediction using traditional machine learning
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7268231/
https://www.ncbi.nlm.nih.gov/pubmed/32487025
http://dx.doi.org/10.1186/s12859-020-3531-9
work_keys_str_mv AT muhammadrafidalihaisam crisprpredseqasequencebasedmethodforsgrnaontargetactivitypredictionusingtraditionalmachinelearning
AT toufikuzzamanmd crisprpredseqasequencebasedmethodforsgrnaontargetactivitypredictionusingtraditionalmachinelearning
AT rahmanmohammadsaifur crisprpredseqasequencebasedmethodforsgrnaontargetactivitypredictionusingtraditionalmachinelearning
AT rahmanmsohel crisprpredseqasequencebasedmethodforsgrnaontargetactivitypredictionusingtraditionalmachinelearning