Cargando…
EpiGePT: a Pretrained Transformer model for epigenomics
The transformer-based models, such as GPT-3(1) and DALL-E(2), have achieved unprecedented breakthroughs in the field of natural language processing and computer vision. The inherent similarities between natural language and biological sequences have prompted a new wave of inferring the grammatical r...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cold Spring Harbor Laboratory
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10370089/ https://www.ncbi.nlm.nih.gov/pubmed/37502861 http://dx.doi.org/10.1101/2023.07.15.549134 |
_version_ | 1785077886976262144 |
---|---|
author | Gao, Zijing Liu, Qiao Zeng, Wanwen Wong, Wing Hung Jiang, Rui |
author_facet | Gao, Zijing Liu, Qiao Zeng, Wanwen Wong, Wing Hung Jiang, Rui |
author_sort | Gao, Zijing |
collection | PubMed |
description | The transformer-based models, such as GPT-3(1) and DALL-E(2), have achieved unprecedented breakthroughs in the field of natural language processing and computer vision. The inherent similarities between natural language and biological sequences have prompted a new wave of inferring the grammatical rules underneath the biological sequences. In genomic study, it is worth noting that DNA sequences alone cannot explain all the gene activities due to epigenetic mechanism. To investigate this problem, we propose EpiGePT, a new transformer-based language pretrained model in epigenomics, for predicting genome-wide epigenomic signals by considering the mechanistic modeling of transcriptional regulation. Specifically, EpiGePT takes the context-specific activities of transcription factors (TFs) into consideration, which could offer deeper biological insights comparing to models trained on DNA sequence only. In a series of experiments, EpiGePT demonstrates state-of-the-art performance in a diverse epigenomic signals prediction tasks as well as new prediction tasks by fine-tuning. Furthermore, EpiGePT is capable of learning the cell-type-specific long-range interactions through the self-attention mechanism and interpreting the genetic variants that associated with human diseases. We expect that the advances of EpiGePT can shed light on understanding the complex regulatory mechanisms in gene regulation. We provide free online prediction service of EpiGePT through https://health.tsinghua.edu.cn/epigept/. |
format | Online Article Text |
id | pubmed-10370089 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Cold Spring Harbor Laboratory |
record_format | MEDLINE/PubMed |
spelling | pubmed-103700892023-07-27 EpiGePT: a Pretrained Transformer model for epigenomics Gao, Zijing Liu, Qiao Zeng, Wanwen Wong, Wing Hung Jiang, Rui bioRxiv Article The transformer-based models, such as GPT-3(1) and DALL-E(2), have achieved unprecedented breakthroughs in the field of natural language processing and computer vision. The inherent similarities between natural language and biological sequences have prompted a new wave of inferring the grammatical rules underneath the biological sequences. In genomic study, it is worth noting that DNA sequences alone cannot explain all the gene activities due to epigenetic mechanism. To investigate this problem, we propose EpiGePT, a new transformer-based language pretrained model in epigenomics, for predicting genome-wide epigenomic signals by considering the mechanistic modeling of transcriptional regulation. Specifically, EpiGePT takes the context-specific activities of transcription factors (TFs) into consideration, which could offer deeper biological insights comparing to models trained on DNA sequence only. In a series of experiments, EpiGePT demonstrates state-of-the-art performance in a diverse epigenomic signals prediction tasks as well as new prediction tasks by fine-tuning. Furthermore, EpiGePT is capable of learning the cell-type-specific long-range interactions through the self-attention mechanism and interpreting the genetic variants that associated with human diseases. We expect that the advances of EpiGePT can shed light on understanding the complex regulatory mechanisms in gene regulation. We provide free online prediction service of EpiGePT through https://health.tsinghua.edu.cn/epigept/. Cold Spring Harbor Laboratory 2023-07-18 /pmc/articles/PMC10370089/ /pubmed/37502861 http://dx.doi.org/10.1101/2023.07.15.549134 Text en https://creativecommons.org/licenses/by-nc-nd/4.0/This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nc-nd/4.0/) , which allows reusers to copy and distribute the material in any medium or format in unadapted form only, for noncommercial purposes only, and only so long as attribution is given to the creator. |
spellingShingle | Article Gao, Zijing Liu, Qiao Zeng, Wanwen Wong, Wing Hung Jiang, Rui EpiGePT: a Pretrained Transformer model for epigenomics |
title | EpiGePT: a Pretrained Transformer model for epigenomics |
title_full | EpiGePT: a Pretrained Transformer model for epigenomics |
title_fullStr | EpiGePT: a Pretrained Transformer model for epigenomics |
title_full_unstemmed | EpiGePT: a Pretrained Transformer model for epigenomics |
title_short | EpiGePT: a Pretrained Transformer model for epigenomics |
title_sort | epigept: a pretrained transformer model for epigenomics |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10370089/ https://www.ncbi.nlm.nih.gov/pubmed/37502861 http://dx.doi.org/10.1101/2023.07.15.549134 |
work_keys_str_mv | AT gaozijing epigeptapretrainedtransformermodelforepigenomics AT liuqiao epigeptapretrainedtransformermodelforepigenomics AT zengwanwen epigeptapretrainedtransformermodelforepigenomics AT wongwinghung epigeptapretrainedtransformermodelforepigenomics AT jiangrui epigeptapretrainedtransformermodelforepigenomics |