Cargando…

Pretraining strategies for effective promoter-driven gene expression prediction

Advances in gene delivery technologies are enabling rapid progress in molecular medicine, but require precise expression of genetic cargo in desired cell types, which is predominantly achieved via a regulatory DNA sequence called a promoter; however, only a handful of cell type-specific promoters ar...

Descripción completa

Detalles Bibliográficos
Autores principales: Reddy, Aniketh Janardhan, Herschl, Michael H., Kolli, Sathvik, Lu, Amy X., Geng, Xinyang, Kumar, Aviral, Hsu, Patrick D., Levine, Sergey, Ioannidis, Nilah M.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10002662/
https://www.ncbi.nlm.nih.gov/pubmed/36909524
http://dx.doi.org/10.1101/2023.02.24.529941
_version_ 1784904438385737728
author Reddy, Aniketh Janardhan
Herschl, Michael H.
Kolli, Sathvik
Lu, Amy X.
Geng, Xinyang
Kumar, Aviral
Hsu, Patrick D.
Levine, Sergey
Ioannidis, Nilah M.
author_facet Reddy, Aniketh Janardhan
Herschl, Michael H.
Kolli, Sathvik
Lu, Amy X.
Geng, Xinyang
Kumar, Aviral
Hsu, Patrick D.
Levine, Sergey
Ioannidis, Nilah M.
author_sort Reddy, Aniketh Janardhan
collection PubMed
description Advances in gene delivery technologies are enabling rapid progress in molecular medicine, but require precise expression of genetic cargo in desired cell types, which is predominantly achieved via a regulatory DNA sequence called a promoter; however, only a handful of cell type-specific promoters are known. Efficiently designing compact promoter sequences with a high density of regulatory information by leveraging machine learning models would therefore be broadly impactful for fundamental research and direct therapeutic applications. However, models of expression from such compact promoter sequences are lacking, despite the recent success of deep learning in modelling expression from endogenous regulatory sequences. Despite the lack of large datasets measuring promoter-driven expression in many cell types, data from a few well-studied cell types or from endogenous gene expression may provide relevant information for transfer learning, which has not yet been explored in this setting. Here, we evaluate a variety of pretraining tasks and transfer strategies for modelling cell type-specific expression from compact promoters and demonstrate the effectiveness of pretraining on existing promoter-driven expression datasets from other cell types. Our approach is broadly applicable for modelling promoter-driven expression in any data-limited cell type of interest, and will enable the use of model-based optimization techniques for promoter design for gene delivery applications. Our code and data are available at https://github.com/anikethjr/promoter_models.
format Online
Article
Text
id pubmed-10002662
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-100026622023-03-11 Pretraining strategies for effective promoter-driven gene expression prediction Reddy, Aniketh Janardhan Herschl, Michael H. Kolli, Sathvik Lu, Amy X. Geng, Xinyang Kumar, Aviral Hsu, Patrick D. Levine, Sergey Ioannidis, Nilah M. bioRxiv Article Advances in gene delivery technologies are enabling rapid progress in molecular medicine, but require precise expression of genetic cargo in desired cell types, which is predominantly achieved via a regulatory DNA sequence called a promoter; however, only a handful of cell type-specific promoters are known. Efficiently designing compact promoter sequences with a high density of regulatory information by leveraging machine learning models would therefore be broadly impactful for fundamental research and direct therapeutic applications. However, models of expression from such compact promoter sequences are lacking, despite the recent success of deep learning in modelling expression from endogenous regulatory sequences. Despite the lack of large datasets measuring promoter-driven expression in many cell types, data from a few well-studied cell types or from endogenous gene expression may provide relevant information for transfer learning, which has not yet been explored in this setting. Here, we evaluate a variety of pretraining tasks and transfer strategies for modelling cell type-specific expression from compact promoters and demonstrate the effectiveness of pretraining on existing promoter-driven expression datasets from other cell types. Our approach is broadly applicable for modelling promoter-driven expression in any data-limited cell type of interest, and will enable the use of model-based optimization techniques for promoter design for gene delivery applications. Our code and data are available at https://github.com/anikethjr/promoter_models. Cold Spring Harbor Laboratory 2023-02-27 /pmc/articles/PMC10002662/ /pubmed/36909524 http://dx.doi.org/10.1101/2023.02.24.529941 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nc-nd/4.0/) , which allows reusers to copy and distribute the material in any medium or format in unadapted form only, for noncommercial purposes only, and only so long as attribution is given to the creator.
spellingShingle Article
Reddy, Aniketh Janardhan
Herschl, Michael H.
Kolli, Sathvik
Lu, Amy X.
Geng, Xinyang
Kumar, Aviral
Hsu, Patrick D.
Levine, Sergey
Ioannidis, Nilah M.
Pretraining strategies for effective promoter-driven gene expression prediction
title Pretraining strategies for effective promoter-driven gene expression prediction
title_full Pretraining strategies for effective promoter-driven gene expression prediction
title_fullStr Pretraining strategies for effective promoter-driven gene expression prediction
title_full_unstemmed Pretraining strategies for effective promoter-driven gene expression prediction
title_short Pretraining strategies for effective promoter-driven gene expression prediction
title_sort pretraining strategies for effective promoter-driven gene expression prediction
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10002662/
https://www.ncbi.nlm.nih.gov/pubmed/36909524
http://dx.doi.org/10.1101/2023.02.24.529941
work_keys_str_mv AT reddyanikethjanardhan pretrainingstrategiesforeffectivepromoterdrivengeneexpressionprediction
AT herschlmichaelh pretrainingstrategiesforeffectivepromoterdrivengeneexpressionprediction
AT kollisathvik pretrainingstrategiesforeffectivepromoterdrivengeneexpressionprediction
AT luamyx pretrainingstrategiesforeffectivepromoterdrivengeneexpressionprediction
AT gengxinyang pretrainingstrategiesforeffectivepromoterdrivengeneexpressionprediction
AT kumaraviral pretrainingstrategiesforeffectivepromoterdrivengeneexpressionprediction
AT hsupatrickd pretrainingstrategiesforeffectivepromoterdrivengeneexpressionprediction
AT levinesergey pretrainingstrategiesforeffectivepromoterdrivengeneexpressionprediction
AT ioannidisnilahm pretrainingstrategiesforeffectivepromoterdrivengeneexpressionprediction