Cargando…

A generalizable Cas9/sgRNA prediction model using machine transfer learning with small high-quality datasets

The CRISPR/Cas9 nuclease from Streptococcus pyogenes (SpCas9) can be used with single guide RNAs (sgRNAs) as a sequence-specific antimicrobial agent and as a genome-engineering tool. However, current bacterial sgRNA activity models struggle with accurate predictions and do not generalize well, possi...

Descripción completa

Detalles Bibliográficos
Autores principales: Ham, Dalton T., Browne, Tyler S., Banglorewala, Pooja N., Wilson, Tyler L., Michael, Richard K., Gloor, Gregory B., Edgell, David R.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10485023/
https://www.ncbi.nlm.nih.gov/pubmed/37679324
http://dx.doi.org/10.1038/s41467-023-41143-7
_version_ 1785102703771254784
author Ham, Dalton T.
Browne, Tyler S.
Banglorewala, Pooja N.
Wilson, Tyler L.
Michael, Richard K.
Gloor, Gregory B.
Edgell, David R.
author_facet Ham, Dalton T.
Browne, Tyler S.
Banglorewala, Pooja N.
Wilson, Tyler L.
Michael, Richard K.
Gloor, Gregory B.
Edgell, David R.
author_sort Ham, Dalton T.
collection PubMed
description The CRISPR/Cas9 nuclease from Streptococcus pyogenes (SpCas9) can be used with single guide RNAs (sgRNAs) as a sequence-specific antimicrobial agent and as a genome-engineering tool. However, current bacterial sgRNA activity models struggle with accurate predictions and do not generalize well, possibly because the underlying datasets used to train the models do not accurately measure SpCas9/sgRNA activity and cannot distinguish on-target cleavage from toxicity. Here, we solve this problem by using a two-plasmid positive selection system to generate high-quality data that more accurately reports on SpCas9/sgRNA cleavage and that separates activity from toxicity. We develop a machine learning architecture (crisprHAL) that can be trained on existing datasets, that shows marked improvements in sgRNA activity prediction accuracy when transfer learning is used with small amounts of high-quality data, and that can generalize predictions to different bacteria. The crisprHAL model recapitulates known SpCas9/sgRNA-target DNA interactions and provides a pathway to a generalizable sgRNA bacterial activity prediction tool that will enable accurate antimicrobial and genome engineering applications.
format Online
Article
Text
id pubmed-10485023
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-104850232023-09-09 A generalizable Cas9/sgRNA prediction model using machine transfer learning with small high-quality datasets Ham, Dalton T. Browne, Tyler S. Banglorewala, Pooja N. Wilson, Tyler L. Michael, Richard K. Gloor, Gregory B. Edgell, David R. Nat Commun Article The CRISPR/Cas9 nuclease from Streptococcus pyogenes (SpCas9) can be used with single guide RNAs (sgRNAs) as a sequence-specific antimicrobial agent and as a genome-engineering tool. However, current bacterial sgRNA activity models struggle with accurate predictions and do not generalize well, possibly because the underlying datasets used to train the models do not accurately measure SpCas9/sgRNA activity and cannot distinguish on-target cleavage from toxicity. Here, we solve this problem by using a two-plasmid positive selection system to generate high-quality data that more accurately reports on SpCas9/sgRNA cleavage and that separates activity from toxicity. We develop a machine learning architecture (crisprHAL) that can be trained on existing datasets, that shows marked improvements in sgRNA activity prediction accuracy when transfer learning is used with small amounts of high-quality data, and that can generalize predictions to different bacteria. The crisprHAL model recapitulates known SpCas9/sgRNA-target DNA interactions and provides a pathway to a generalizable sgRNA bacterial activity prediction tool that will enable accurate antimicrobial and genome engineering applications. Nature Publishing Group UK 2023-09-07 /pmc/articles/PMC10485023/ /pubmed/37679324 http://dx.doi.org/10.1038/s41467-023-41143-7 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) .
spellingShingle Article
Ham, Dalton T.
Browne, Tyler S.
Banglorewala, Pooja N.
Wilson, Tyler L.
Michael, Richard K.
Gloor, Gregory B.
Edgell, David R.
A generalizable Cas9/sgRNA prediction model using machine transfer learning with small high-quality datasets
title A generalizable Cas9/sgRNA prediction model using machine transfer learning with small high-quality datasets
title_full A generalizable Cas9/sgRNA prediction model using machine transfer learning with small high-quality datasets
title_fullStr A generalizable Cas9/sgRNA prediction model using machine transfer learning with small high-quality datasets
title_full_unstemmed A generalizable Cas9/sgRNA prediction model using machine transfer learning with small high-quality datasets
title_short A generalizable Cas9/sgRNA prediction model using machine transfer learning with small high-quality datasets
title_sort generalizable cas9/sgrna prediction model using machine transfer learning with small high-quality datasets
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10485023/
https://www.ncbi.nlm.nih.gov/pubmed/37679324
http://dx.doi.org/10.1038/s41467-023-41143-7
work_keys_str_mv AT hamdaltont ageneralizablecas9sgrnapredictionmodelusingmachinetransferlearningwithsmallhighqualitydatasets
AT brownetylers ageneralizablecas9sgrnapredictionmodelusingmachinetransferlearningwithsmallhighqualitydatasets
AT banglorewalapoojan ageneralizablecas9sgrnapredictionmodelusingmachinetransferlearningwithsmallhighqualitydatasets
AT wilsontylerl ageneralizablecas9sgrnapredictionmodelusingmachinetransferlearningwithsmallhighqualitydatasets
AT michaelrichardk ageneralizablecas9sgrnapredictionmodelusingmachinetransferlearningwithsmallhighqualitydatasets
AT gloorgregoryb ageneralizablecas9sgrnapredictionmodelusingmachinetransferlearningwithsmallhighqualitydatasets
AT edgelldavidr ageneralizablecas9sgrnapredictionmodelusingmachinetransferlearningwithsmallhighqualitydatasets
AT hamdaltont generalizablecas9sgrnapredictionmodelusingmachinetransferlearningwithsmallhighqualitydatasets
AT brownetylers generalizablecas9sgrnapredictionmodelusingmachinetransferlearningwithsmallhighqualitydatasets
AT banglorewalapoojan generalizablecas9sgrnapredictionmodelusingmachinetransferlearningwithsmallhighqualitydatasets
AT wilsontylerl generalizablecas9sgrnapredictionmodelusingmachinetransferlearningwithsmallhighqualitydatasets
AT michaelrichardk generalizablecas9sgrnapredictionmodelusingmachinetransferlearningwithsmallhighqualitydatasets
AT gloorgregoryb generalizablecas9sgrnapredictionmodelusingmachinetransferlearningwithsmallhighqualitydatasets
AT edgelldavidr generalizablecas9sgrnapredictionmodelusingmachinetransferlearningwithsmallhighqualitydatasets