Cargando…
Pretraining strategies for effective promoter-driven gene expression prediction
Advances in gene delivery technologies are enabling rapid progress in molecular medicine, but require precise expression of genetic cargo in desired cell types, which is predominantly achieved via a regulatory DNA sequence called a promoter; however, only a handful of cell type-specific promoters ar...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cold Spring Harbor Laboratory
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10002662/ https://www.ncbi.nlm.nih.gov/pubmed/36909524 http://dx.doi.org/10.1101/2023.02.24.529941 |
_version_ | 1784904438385737728 |
---|---|
author | Reddy, Aniketh Janardhan Herschl, Michael H. Kolli, Sathvik Lu, Amy X. Geng, Xinyang Kumar, Aviral Hsu, Patrick D. Levine, Sergey Ioannidis, Nilah M. |
author_facet | Reddy, Aniketh Janardhan Herschl, Michael H. Kolli, Sathvik Lu, Amy X. Geng, Xinyang Kumar, Aviral Hsu, Patrick D. Levine, Sergey Ioannidis, Nilah M. |
author_sort | Reddy, Aniketh Janardhan |
collection | PubMed |
description | Advances in gene delivery technologies are enabling rapid progress in molecular medicine, but require precise expression of genetic cargo in desired cell types, which is predominantly achieved via a regulatory DNA sequence called a promoter; however, only a handful of cell type-specific promoters are known. Efficiently designing compact promoter sequences with a high density of regulatory information by leveraging machine learning models would therefore be broadly impactful for fundamental research and direct therapeutic applications. However, models of expression from such compact promoter sequences are lacking, despite the recent success of deep learning in modelling expression from endogenous regulatory sequences. Despite the lack of large datasets measuring promoter-driven expression in many cell types, data from a few well-studied cell types or from endogenous gene expression may provide relevant information for transfer learning, which has not yet been explored in this setting. Here, we evaluate a variety of pretraining tasks and transfer strategies for modelling cell type-specific expression from compact promoters and demonstrate the effectiveness of pretraining on existing promoter-driven expression datasets from other cell types. Our approach is broadly applicable for modelling promoter-driven expression in any data-limited cell type of interest, and will enable the use of model-based optimization techniques for promoter design for gene delivery applications. Our code and data are available at https://github.com/anikethjr/promoter_models. |
format | Online Article Text |
id | pubmed-10002662 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Cold Spring Harbor Laboratory |
record_format | MEDLINE/PubMed |
spelling | pubmed-100026622023-03-11 Pretraining strategies for effective promoter-driven gene expression prediction Reddy, Aniketh Janardhan Herschl, Michael H. Kolli, Sathvik Lu, Amy X. Geng, Xinyang Kumar, Aviral Hsu, Patrick D. Levine, Sergey Ioannidis, Nilah M. bioRxiv Article Advances in gene delivery technologies are enabling rapid progress in molecular medicine, but require precise expression of genetic cargo in desired cell types, which is predominantly achieved via a regulatory DNA sequence called a promoter; however, only a handful of cell type-specific promoters are known. Efficiently designing compact promoter sequences with a high density of regulatory information by leveraging machine learning models would therefore be broadly impactful for fundamental research and direct therapeutic applications. However, models of expression from such compact promoter sequences are lacking, despite the recent success of deep learning in modelling expression from endogenous regulatory sequences. Despite the lack of large datasets measuring promoter-driven expression in many cell types, data from a few well-studied cell types or from endogenous gene expression may provide relevant information for transfer learning, which has not yet been explored in this setting. Here, we evaluate a variety of pretraining tasks and transfer strategies for modelling cell type-specific expression from compact promoters and demonstrate the effectiveness of pretraining on existing promoter-driven expression datasets from other cell types. Our approach is broadly applicable for modelling promoter-driven expression in any data-limited cell type of interest, and will enable the use of model-based optimization techniques for promoter design for gene delivery applications. Our code and data are available at https://github.com/anikethjr/promoter_models. Cold Spring Harbor Laboratory 2023-02-27 /pmc/articles/PMC10002662/ /pubmed/36909524 http://dx.doi.org/10.1101/2023.02.24.529941 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nc-nd/4.0/) , which allows reusers to copy and distribute the material in any medium or format in unadapted form only, for noncommercial purposes only, and only so long as attribution is given to the creator. |
spellingShingle | Article Reddy, Aniketh Janardhan Herschl, Michael H. Kolli, Sathvik Lu, Amy X. Geng, Xinyang Kumar, Aviral Hsu, Patrick D. Levine, Sergey Ioannidis, Nilah M. Pretraining strategies for effective promoter-driven gene expression prediction |
title | Pretraining strategies for effective promoter-driven gene expression prediction |
title_full | Pretraining strategies for effective promoter-driven gene expression prediction |
title_fullStr | Pretraining strategies for effective promoter-driven gene expression prediction |
title_full_unstemmed | Pretraining strategies for effective promoter-driven gene expression prediction |
title_short | Pretraining strategies for effective promoter-driven gene expression prediction |
title_sort | pretraining strategies for effective promoter-driven gene expression prediction |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10002662/ https://www.ncbi.nlm.nih.gov/pubmed/36909524 http://dx.doi.org/10.1101/2023.02.24.529941 |
work_keys_str_mv | AT reddyanikethjanardhan pretrainingstrategiesforeffectivepromoterdrivengeneexpressionprediction AT herschlmichaelh pretrainingstrategiesforeffectivepromoterdrivengeneexpressionprediction AT kollisathvik pretrainingstrategiesforeffectivepromoterdrivengeneexpressionprediction AT luamyx pretrainingstrategiesforeffectivepromoterdrivengeneexpressionprediction AT gengxinyang pretrainingstrategiesforeffectivepromoterdrivengeneexpressionprediction AT kumaraviral pretrainingstrategiesforeffectivepromoterdrivengeneexpressionprediction AT hsupatrickd pretrainingstrategiesforeffectivepromoterdrivengeneexpressionprediction AT levinesergey pretrainingstrategiesforeffectivepromoterdrivengeneexpressionprediction AT ioannidisnilahm pretrainingstrategiesforeffectivepromoterdrivengeneexpressionprediction |