Cargando…
A generalizable Cas9/sgRNA prediction model using machine transfer learning with small high-quality datasets
The CRISPR/Cas9 nuclease from Streptococcus pyogenes (SpCas9) can be used with single guide RNAs (sgRNAs) as a sequence-specific antimicrobial agent and as a genome-engineering tool. However, current bacterial sgRNA activity models struggle with accurate predictions and do not generalize well, possi...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10485023/ https://www.ncbi.nlm.nih.gov/pubmed/37679324 http://dx.doi.org/10.1038/s41467-023-41143-7 |
_version_ | 1785102703771254784 |
---|---|
author | Ham, Dalton T. Browne, Tyler S. Banglorewala, Pooja N. Wilson, Tyler L. Michael, Richard K. Gloor, Gregory B. Edgell, David R. |
author_facet | Ham, Dalton T. Browne, Tyler S. Banglorewala, Pooja N. Wilson, Tyler L. Michael, Richard K. Gloor, Gregory B. Edgell, David R. |
author_sort | Ham, Dalton T. |
collection | PubMed |
description | The CRISPR/Cas9 nuclease from Streptococcus pyogenes (SpCas9) can be used with single guide RNAs (sgRNAs) as a sequence-specific antimicrobial agent and as a genome-engineering tool. However, current bacterial sgRNA activity models struggle with accurate predictions and do not generalize well, possibly because the underlying datasets used to train the models do not accurately measure SpCas9/sgRNA activity and cannot distinguish on-target cleavage from toxicity. Here, we solve this problem by using a two-plasmid positive selection system to generate high-quality data that more accurately reports on SpCas9/sgRNA cleavage and that separates activity from toxicity. We develop a machine learning architecture (crisprHAL) that can be trained on existing datasets, that shows marked improvements in sgRNA activity prediction accuracy when transfer learning is used with small amounts of high-quality data, and that can generalize predictions to different bacteria. The crisprHAL model recapitulates known SpCas9/sgRNA-target DNA interactions and provides a pathway to a generalizable sgRNA bacterial activity prediction tool that will enable accurate antimicrobial and genome engineering applications. |
format | Online Article Text |
id | pubmed-10485023 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-104850232023-09-09 A generalizable Cas9/sgRNA prediction model using machine transfer learning with small high-quality datasets Ham, Dalton T. Browne, Tyler S. Banglorewala, Pooja N. Wilson, Tyler L. Michael, Richard K. Gloor, Gregory B. Edgell, David R. Nat Commun Article The CRISPR/Cas9 nuclease from Streptococcus pyogenes (SpCas9) can be used with single guide RNAs (sgRNAs) as a sequence-specific antimicrobial agent and as a genome-engineering tool. However, current bacterial sgRNA activity models struggle with accurate predictions and do not generalize well, possibly because the underlying datasets used to train the models do not accurately measure SpCas9/sgRNA activity and cannot distinguish on-target cleavage from toxicity. Here, we solve this problem by using a two-plasmid positive selection system to generate high-quality data that more accurately reports on SpCas9/sgRNA cleavage and that separates activity from toxicity. We develop a machine learning architecture (crisprHAL) that can be trained on existing datasets, that shows marked improvements in sgRNA activity prediction accuracy when transfer learning is used with small amounts of high-quality data, and that can generalize predictions to different bacteria. The crisprHAL model recapitulates known SpCas9/sgRNA-target DNA interactions and provides a pathway to a generalizable sgRNA bacterial activity prediction tool that will enable accurate antimicrobial and genome engineering applications. Nature Publishing Group UK 2023-09-07 /pmc/articles/PMC10485023/ /pubmed/37679324 http://dx.doi.org/10.1038/s41467-023-41143-7 Text en © The Author(s) 2023 https://creativecommons.org/licenses/by/4.0/Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . |
spellingShingle | Article Ham, Dalton T. Browne, Tyler S. Banglorewala, Pooja N. Wilson, Tyler L. Michael, Richard K. Gloor, Gregory B. Edgell, David R. A generalizable Cas9/sgRNA prediction model using machine transfer learning with small high-quality datasets |
title | A generalizable Cas9/sgRNA prediction model using machine transfer learning with small high-quality datasets |
title_full | A generalizable Cas9/sgRNA prediction model using machine transfer learning with small high-quality datasets |
title_fullStr | A generalizable Cas9/sgRNA prediction model using machine transfer learning with small high-quality datasets |
title_full_unstemmed | A generalizable Cas9/sgRNA prediction model using machine transfer learning with small high-quality datasets |
title_short | A generalizable Cas9/sgRNA prediction model using machine transfer learning with small high-quality datasets |
title_sort | generalizable cas9/sgrna prediction model using machine transfer learning with small high-quality datasets |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10485023/ https://www.ncbi.nlm.nih.gov/pubmed/37679324 http://dx.doi.org/10.1038/s41467-023-41143-7 |
work_keys_str_mv | AT hamdaltont ageneralizablecas9sgrnapredictionmodelusingmachinetransferlearningwithsmallhighqualitydatasets AT brownetylers ageneralizablecas9sgrnapredictionmodelusingmachinetransferlearningwithsmallhighqualitydatasets AT banglorewalapoojan ageneralizablecas9sgrnapredictionmodelusingmachinetransferlearningwithsmallhighqualitydatasets AT wilsontylerl ageneralizablecas9sgrnapredictionmodelusingmachinetransferlearningwithsmallhighqualitydatasets AT michaelrichardk ageneralizablecas9sgrnapredictionmodelusingmachinetransferlearningwithsmallhighqualitydatasets AT gloorgregoryb ageneralizablecas9sgrnapredictionmodelusingmachinetransferlearningwithsmallhighqualitydatasets AT edgelldavidr ageneralizablecas9sgrnapredictionmodelusingmachinetransferlearningwithsmallhighqualitydatasets AT hamdaltont generalizablecas9sgrnapredictionmodelusingmachinetransferlearningwithsmallhighqualitydatasets AT brownetylers generalizablecas9sgrnapredictionmodelusingmachinetransferlearningwithsmallhighqualitydatasets AT banglorewalapoojan generalizablecas9sgrnapredictionmodelusingmachinetransferlearningwithsmallhighqualitydatasets AT wilsontylerl generalizablecas9sgrnapredictionmodelusingmachinetransferlearningwithsmallhighqualitydatasets AT michaelrichardk generalizablecas9sgrnapredictionmodelusingmachinetransferlearningwithsmallhighqualitydatasets AT gloorgregoryb generalizablecas9sgrnapredictionmodelusingmachinetransferlearningwithsmallhighqualitydatasets AT edgelldavidr generalizablecas9sgrnapredictionmodelusingmachinetransferlearningwithsmallhighqualitydatasets |