Cargando…

A Pretraining-Retraining Strategy of Deep Learning Improves Cell-Specific Enhancer Predictions

Deciphering the code of cis-regulatory element (CRE) is one of the core issues of today’s biology. Enhancers are distal CREs and play significant roles in gene transcriptional regulation. Although identifications of enhancer locations across the whole genome [discriminative enhancer predictions (DEP...

Descripción completa

Detalles Bibliográficos
Autores principales: Niu, Xiaohui, Yang, Kun, Zhang, Ge, Yang, Zhiquan, Hu, Xuehai
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6960260/
https://www.ncbi.nlm.nih.gov/pubmed/31969903
http://dx.doi.org/10.3389/fgene.2019.01305
_version_ 1783487755487019008
author Niu, Xiaohui
Yang, Kun
Zhang, Ge
Yang, Zhiquan
Hu, Xuehai
author_facet Niu, Xiaohui
Yang, Kun
Zhang, Ge
Yang, Zhiquan
Hu, Xuehai
author_sort Niu, Xiaohui
collection PubMed
description Deciphering the code of cis-regulatory element (CRE) is one of the core issues of today’s biology. Enhancers are distal CREs and play significant roles in gene transcriptional regulation. Although identifications of enhancer locations across the whole genome [discriminative enhancer predictions (DEP)] is necessary, it is more important to predict in which specific cell or tissue types, they will be activated and functional [tissue-specific enhancer predictions (TSEP)]. Although existing deep learning models achieved great successes in DEP, they cannot be directly employed in TSEP because a specific cell or tissue type only has a limited number of available enhancer samples for training. Here, we first adopted a reported deep learning architecture and then developed a novel training strategy named “pretraining-retraining strategy” (PRS) for TSEP by decomposing the whole training process into two successive stages: a pretraining stage is designed to train with the whole enhancer data for performing DEP, and a retraining strategy is then designed to train with tissue-specific enhancer samples based on the trained pretraining model for making TSEP. As a result, PRS is found to be valid for DEP with an AUC of 0.922 and a GM (geometric mean) of 0.696, when testing on a larger-scale FANTOM5 enhancer dataset via a five-fold cross-validation. Interestingly, based on the trained pretraining model, a new finding is that only additional twenty epochs are needed to complete the retraining process on testing 23 specific tissues or cell lines. For TSEP tasks, PRS achieved a mean GM of 0.806 which is significantly higher than 0.528 of gkm-SVM, an existing mainstream method for CRE predictions. Notably, PRS is further proven superior to other two state-of-the-art methods: DEEP and BiRen. In summary, PRS has employed useful ideas from the domain of transfer learning and is a reliable method for TSEPs.
format Online
Article
Text
id pubmed-6960260
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-69602602020-01-22 A Pretraining-Retraining Strategy of Deep Learning Improves Cell-Specific Enhancer Predictions Niu, Xiaohui Yang, Kun Zhang, Ge Yang, Zhiquan Hu, Xuehai Front Genet Genetics Deciphering the code of cis-regulatory element (CRE) is one of the core issues of today’s biology. Enhancers are distal CREs and play significant roles in gene transcriptional regulation. Although identifications of enhancer locations across the whole genome [discriminative enhancer predictions (DEP)] is necessary, it is more important to predict in which specific cell or tissue types, they will be activated and functional [tissue-specific enhancer predictions (TSEP)]. Although existing deep learning models achieved great successes in DEP, they cannot be directly employed in TSEP because a specific cell or tissue type only has a limited number of available enhancer samples for training. Here, we first adopted a reported deep learning architecture and then developed a novel training strategy named “pretraining-retraining strategy” (PRS) for TSEP by decomposing the whole training process into two successive stages: a pretraining stage is designed to train with the whole enhancer data for performing DEP, and a retraining strategy is then designed to train with tissue-specific enhancer samples based on the trained pretraining model for making TSEP. As a result, PRS is found to be valid for DEP with an AUC of 0.922 and a GM (geometric mean) of 0.696, when testing on a larger-scale FANTOM5 enhancer dataset via a five-fold cross-validation. Interestingly, based on the trained pretraining model, a new finding is that only additional twenty epochs are needed to complete the retraining process on testing 23 specific tissues or cell lines. For TSEP tasks, PRS achieved a mean GM of 0.806 which is significantly higher than 0.528 of gkm-SVM, an existing mainstream method for CRE predictions. Notably, PRS is further proven superior to other two state-of-the-art methods: DEEP and BiRen. In summary, PRS has employed useful ideas from the domain of transfer learning and is a reliable method for TSEPs. Frontiers Media S.A. 2020-01-08 /pmc/articles/PMC6960260/ /pubmed/31969903 http://dx.doi.org/10.3389/fgene.2019.01305 Text en Copyright © 2020 Niu, Yang, Zhang, Yang and Hu http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Niu, Xiaohui
Yang, Kun
Zhang, Ge
Yang, Zhiquan
Hu, Xuehai
A Pretraining-Retraining Strategy of Deep Learning Improves Cell-Specific Enhancer Predictions
title A Pretraining-Retraining Strategy of Deep Learning Improves Cell-Specific Enhancer Predictions
title_full A Pretraining-Retraining Strategy of Deep Learning Improves Cell-Specific Enhancer Predictions
title_fullStr A Pretraining-Retraining Strategy of Deep Learning Improves Cell-Specific Enhancer Predictions
title_full_unstemmed A Pretraining-Retraining Strategy of Deep Learning Improves Cell-Specific Enhancer Predictions
title_short A Pretraining-Retraining Strategy of Deep Learning Improves Cell-Specific Enhancer Predictions
title_sort pretraining-retraining strategy of deep learning improves cell-specific enhancer predictions
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6960260/
https://www.ncbi.nlm.nih.gov/pubmed/31969903
http://dx.doi.org/10.3389/fgene.2019.01305
work_keys_str_mv AT niuxiaohui apretrainingretrainingstrategyofdeeplearningimprovescellspecificenhancerpredictions
AT yangkun apretrainingretrainingstrategyofdeeplearningimprovescellspecificenhancerpredictions
AT zhangge apretrainingretrainingstrategyofdeeplearningimprovescellspecificenhancerpredictions
AT yangzhiquan apretrainingretrainingstrategyofdeeplearningimprovescellspecificenhancerpredictions
AT huxuehai apretrainingretrainingstrategyofdeeplearningimprovescellspecificenhancerpredictions
AT niuxiaohui pretrainingretrainingstrategyofdeeplearningimprovescellspecificenhancerpredictions
AT yangkun pretrainingretrainingstrategyofdeeplearningimprovescellspecificenhancerpredictions
AT zhangge pretrainingretrainingstrategyofdeeplearningimprovescellspecificenhancerpredictions
AT yangzhiquan pretrainingretrainingstrategyofdeeplearningimprovescellspecificenhancerpredictions
AT huxuehai pretrainingretrainingstrategyofdeeplearningimprovescellspecificenhancerpredictions