Cargando…

CRMnet: A deep learning model for predicting gene expression from large regulatory sequence datasets

Recent large datasets measuring the gene expression of millions of possible gene promoter sequences provide a resource to design and train optimized deep neural network architectures to predict expression from sequences. High predictive performance due to the modeling of dependencies within and betw...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ding, Ke, Dixit, Gunjan, Parker, Brian J., Wen, Jiayu
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2023
Materias:	Big Data
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10043243/ https://www.ncbi.nlm.nih.gov/pubmed/36999047 http://dx.doi.org/10.3389/fdata.2023.1113402

_version_	1784913102157905920
author	Ding, Ke Dixit, Gunjan Parker, Brian J. Wen, Jiayu
author_facet	Ding, Ke Dixit, Gunjan Parker, Brian J. Wen, Jiayu
author_sort	Ding, Ke
collection	PubMed
description	Recent large datasets measuring the gene expression of millions of possible gene promoter sequences provide a resource to design and train optimized deep neural network architectures to predict expression from sequences. High predictive performance due to the modeling of dependencies within and between regulatory sequences is an enabler for biological discoveries in gene regulation through model interpretation techniques. To understand the regulatory code that delineates gene expression, we have designed a novel deep-learning model (CRMnet) to predict gene expression in Saccharomyces cerevisiae. Our model outperforms the current benchmark models and achieves a Pearson correlation coefficient of 0.971 and a mean squared error of 3.200. Interpretation of informative genomic regions determined from model saliency maps, and overlapping the saliency maps with known yeast motifs, supports that our model can successfully locate the binding sites of transcription factors that actively modulate gene expression. We compare our model's training times on a large compute cluster with GPUs and Google TPUs to indicate practical training times on similar datasets.
format	Online Article Text
id	pubmed-10043243
institution	National Center for Biotechnology Information
language	English
publishDate	2023
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-100432432023-03-29 CRMnet: A deep learning model for predicting gene expression from large regulatory sequence datasets Ding, Ke Dixit, Gunjan Parker, Brian J. Wen, Jiayu Front Big Data Big Data Recent large datasets measuring the gene expression of millions of possible gene promoter sequences provide a resource to design and train optimized deep neural network architectures to predict expression from sequences. High predictive performance due to the modeling of dependencies within and between regulatory sequences is an enabler for biological discoveries in gene regulation through model interpretation techniques. To understand the regulatory code that delineates gene expression, we have designed a novel deep-learning model (CRMnet) to predict gene expression in Saccharomyces cerevisiae. Our model outperforms the current benchmark models and achieves a Pearson correlation coefficient of 0.971 and a mean squared error of 3.200. Interpretation of informative genomic regions determined from model saliency maps, and overlapping the saliency maps with known yeast motifs, supports that our model can successfully locate the binding sites of transcription factors that actively modulate gene expression. We compare our model's training times on a large compute cluster with GPUs and Google TPUs to indicate practical training times on similar datasets. Frontiers Media S.A. 2023-03-14 /pmc/articles/PMC10043243/ /pubmed/36999047 http://dx.doi.org/10.3389/fdata.2023.1113402 Text en Copyright © 2023 Ding, Dixit, Parker and Wen. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Big Data Ding, Ke Dixit, Gunjan Parker, Brian J. Wen, Jiayu CRMnet: A deep learning model for predicting gene expression from large regulatory sequence datasets
title	CRMnet: A deep learning model for predicting gene expression from large regulatory sequence datasets
title_full	CRMnet: A deep learning model for predicting gene expression from large regulatory sequence datasets
title_fullStr	CRMnet: A deep learning model for predicting gene expression from large regulatory sequence datasets
title_full_unstemmed	CRMnet: A deep learning model for predicting gene expression from large regulatory sequence datasets
title_short	CRMnet: A deep learning model for predicting gene expression from large regulatory sequence datasets
title_sort	crmnet: a deep learning model for predicting gene expression from large regulatory sequence datasets
topic	Big Data
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10043243/ https://www.ncbi.nlm.nih.gov/pubmed/36999047 http://dx.doi.org/10.3389/fdata.2023.1113402
work_keys_str_mv	AT dingke crmnetadeeplearningmodelforpredictinggeneexpressionfromlargeregulatorysequencedatasets AT dixitgunjan crmnetadeeplearningmodelforpredictinggeneexpressionfromlargeregulatorysequencedatasets AT parkerbrianj crmnetadeeplearningmodelforpredictinggeneexpressionfromlargeregulatorysequencedatasets AT wenjiayu crmnetadeeplearningmodelforpredictinggeneexpressionfromlargeregulatorysequencedatasets

CRMnet: A deep learning model for predicting gene expression from large regulatory sequence datasets

Ejemplares similares