Cargando…

EpiGePT: a Pretrained Transformer model for epigenomics

The transformer-based models, such as GPT-3(1) and DALL-E(2), have achieved unprecedented breakthroughs in the field of natural language processing and computer vision. The inherent similarities between natural language and biological sequences have prompted a new wave of inferring the grammatical r...

Descripción completa

Detalles Bibliográficos
Autores principales: Gao, Zijing, Liu, Qiao, Zeng, Wanwen, Wong, Wing Hung, Jiang, Rui
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10370089/
https://www.ncbi.nlm.nih.gov/pubmed/37502861
http://dx.doi.org/10.1101/2023.07.15.549134
_version_ 1785077886976262144
author Gao, Zijing
Liu, Qiao
Zeng, Wanwen
Wong, Wing Hung
Jiang, Rui
author_facet Gao, Zijing
Liu, Qiao
Zeng, Wanwen
Wong, Wing Hung
Jiang, Rui
author_sort Gao, Zijing
collection PubMed
description The transformer-based models, such as GPT-3(1) and DALL-E(2), have achieved unprecedented breakthroughs in the field of natural language processing and computer vision. The inherent similarities between natural language and biological sequences have prompted a new wave of inferring the grammatical rules underneath the biological sequences. In genomic study, it is worth noting that DNA sequences alone cannot explain all the gene activities due to epigenetic mechanism. To investigate this problem, we propose EpiGePT, a new transformer-based language pretrained model in epigenomics, for predicting genome-wide epigenomic signals by considering the mechanistic modeling of transcriptional regulation. Specifically, EpiGePT takes the context-specific activities of transcription factors (TFs) into consideration, which could offer deeper biological insights comparing to models trained on DNA sequence only. In a series of experiments, EpiGePT demonstrates state-of-the-art performance in a diverse epigenomic signals prediction tasks as well as new prediction tasks by fine-tuning. Furthermore, EpiGePT is capable of learning the cell-type-specific long-range interactions through the self-attention mechanism and interpreting the genetic variants that associated with human diseases. We expect that the advances of EpiGePT can shed light on understanding the complex regulatory mechanisms in gene regulation. We provide free online prediction service of EpiGePT through https://health.tsinghua.edu.cn/epigept/.
format Online
Article
Text
id pubmed-10370089
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-103700892023-07-27 EpiGePT: a Pretrained Transformer model for epigenomics Gao, Zijing Liu, Qiao Zeng, Wanwen Wong, Wing Hung Jiang, Rui bioRxiv Article The transformer-based models, such as GPT-3(1) and DALL-E(2), have achieved unprecedented breakthroughs in the field of natural language processing and computer vision. The inherent similarities between natural language and biological sequences have prompted a new wave of inferring the grammatical rules underneath the biological sequences. In genomic study, it is worth noting that DNA sequences alone cannot explain all the gene activities due to epigenetic mechanism. To investigate this problem, we propose EpiGePT, a new transformer-based language pretrained model in epigenomics, for predicting genome-wide epigenomic signals by considering the mechanistic modeling of transcriptional regulation. Specifically, EpiGePT takes the context-specific activities of transcription factors (TFs) into consideration, which could offer deeper biological insights comparing to models trained on DNA sequence only. In a series of experiments, EpiGePT demonstrates state-of-the-art performance in a diverse epigenomic signals prediction tasks as well as new prediction tasks by fine-tuning. Furthermore, EpiGePT is capable of learning the cell-type-specific long-range interactions through the self-attention mechanism and interpreting the genetic variants that associated with human diseases. We expect that the advances of EpiGePT can shed light on understanding the complex regulatory mechanisms in gene regulation. We provide free online prediction service of EpiGePT through https://health.tsinghua.edu.cn/epigept/. Cold Spring Harbor Laboratory 2023-07-18 /pmc/articles/PMC10370089/ /pubmed/37502861 http://dx.doi.org/10.1101/2023.07.15.549134 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nc-nd/4.0/) , which allows reusers to copy and distribute the material in any medium or format in unadapted form only, for noncommercial purposes only, and only so long as attribution is given to the creator.
spellingShingle Article
Gao, Zijing
Liu, Qiao
Zeng, Wanwen
Wong, Wing Hung
Jiang, Rui
EpiGePT: a Pretrained Transformer model for epigenomics
title EpiGePT: a Pretrained Transformer model for epigenomics
title_full EpiGePT: a Pretrained Transformer model for epigenomics
title_fullStr EpiGePT: a Pretrained Transformer model for epigenomics
title_full_unstemmed EpiGePT: a Pretrained Transformer model for epigenomics
title_short EpiGePT: a Pretrained Transformer model for epigenomics
title_sort epigept: a pretrained transformer model for epigenomics
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10370089/
https://www.ncbi.nlm.nih.gov/pubmed/37502861
http://dx.doi.org/10.1101/2023.07.15.549134
work_keys_str_mv AT gaozijing epigeptapretrainedtransformermodelforepigenomics
AT liuqiao epigeptapretrainedtransformermodelforepigenomics
AT zengwanwen epigeptapretrainedtransformermodelforepigenomics
AT wongwinghung epigeptapretrainedtransformermodelforepigenomics
AT jiangrui epigeptapretrainedtransformermodelforepigenomics