Cargando…

A successful hybrid deep learning model aiming at promoter identification

BACKGROUND: The zone adjacent to a transcription start site (TSS), namely, the promoter, is primarily involved in the process of DNA transcription initiation and regulation. As a result, proper promoter identification is critical for further understanding the mechanism of the networks controlling ge...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Ying, Peng, Qinke, Mou, Xu, Wang, Xinyuan, Li, Haozhou, Han, Tian, Sun, Zhao, Wang, Xiao
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9158169/
https://www.ncbi.nlm.nih.gov/pubmed/35641900
http://dx.doi.org/10.1186/s12859-022-04735-6
_version_ 1784718781690413056
author Wang, Ying
Peng, Qinke
Mou, Xu
Wang, Xinyuan
Li, Haozhou
Han, Tian
Sun, Zhao
Wang, Xiao
author_facet Wang, Ying
Peng, Qinke
Mou, Xu
Wang, Xinyuan
Li, Haozhou
Han, Tian
Sun, Zhao
Wang, Xiao
author_sort Wang, Ying
collection PubMed
description BACKGROUND: The zone adjacent to a transcription start site (TSS), namely, the promoter, is primarily involved in the process of DNA transcription initiation and regulation. As a result, proper promoter identification is critical for further understanding the mechanism of the networks controlling genomic regulation. A number of methodologies for the identification of promoters have been proposed. Nonetheless, due to the great heterogeneity existing in promoters, the results of these procedures are still unsatisfactory. In order to establish additional discriminative characteristics and properly recognize promoters, we developed the hybrid model for promoter identification (HMPI), a hybrid deep learning model that can characterize both the native sequences of promoters and the morphological outline of promoters at the same time. We developed the HMPI to combine a method called the PSFN (promoter sequence features network), which characterizes native promoter sequences and deduces sequence features, with a technique referred to as the DSPN (deep structural profiles network), which is specially structured to model the promoters in terms of their structural profile and to deduce their structural attributes. RESULTS: The HMPI was applied to human, plant and Escherichia coli K-12 strain datasets, and the findings showed that the HMPI was successful at extracting the features of the promoter while greatly enhancing the promoter identification performance. In addition, after the improvements of synthetic sampling, transfer learning and label smoothing regularization, the improved HMPI models achieved good results in identifying subtypes of promoters on prokaryotic promoter datasets. CONCLUSIONS: The results showed that the HMPI was successful at extracting the features of promoters while greatly enhancing the performance of identifying promoters on both eukaryotic and prokaryotic datasets, and the improved HMPI models are good at identifying subtypes of promoters on prokaryotic promoter datasets. The HMPI is additionally adaptable to different biological functional sequences, allowing for the addition of new features or models. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04735-6.
format Online
Article
Text
id pubmed-9158169
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-91581692022-06-02 A successful hybrid deep learning model aiming at promoter identification Wang, Ying Peng, Qinke Mou, Xu Wang, Xinyuan Li, Haozhou Han, Tian Sun, Zhao Wang, Xiao BMC Bioinformatics Research BACKGROUND: The zone adjacent to a transcription start site (TSS), namely, the promoter, is primarily involved in the process of DNA transcription initiation and regulation. As a result, proper promoter identification is critical for further understanding the mechanism of the networks controlling genomic regulation. A number of methodologies for the identification of promoters have been proposed. Nonetheless, due to the great heterogeneity existing in promoters, the results of these procedures are still unsatisfactory. In order to establish additional discriminative characteristics and properly recognize promoters, we developed the hybrid model for promoter identification (HMPI), a hybrid deep learning model that can characterize both the native sequences of promoters and the morphological outline of promoters at the same time. We developed the HMPI to combine a method called the PSFN (promoter sequence features network), which characterizes native promoter sequences and deduces sequence features, with a technique referred to as the DSPN (deep structural profiles network), which is specially structured to model the promoters in terms of their structural profile and to deduce their structural attributes. RESULTS: The HMPI was applied to human, plant and Escherichia coli K-12 strain datasets, and the findings showed that the HMPI was successful at extracting the features of the promoter while greatly enhancing the promoter identification performance. In addition, after the improvements of synthetic sampling, transfer learning and label smoothing regularization, the improved HMPI models achieved good results in identifying subtypes of promoters on prokaryotic promoter datasets. CONCLUSIONS: The results showed that the HMPI was successful at extracting the features of promoters while greatly enhancing the performance of identifying promoters on both eukaryotic and prokaryotic datasets, and the improved HMPI models are good at identifying subtypes of promoters on prokaryotic promoter datasets. The HMPI is additionally adaptable to different biological functional sequences, allowing for the addition of new features or models. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04735-6. BioMed Central 2022-05-31 /pmc/articles/PMC9158169/ /pubmed/35641900 http://dx.doi.org/10.1186/s12859-022-04735-6 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Wang, Ying
Peng, Qinke
Mou, Xu
Wang, Xinyuan
Li, Haozhou
Han, Tian
Sun, Zhao
Wang, Xiao
A successful hybrid deep learning model aiming at promoter identification
title A successful hybrid deep learning model aiming at promoter identification
title_full A successful hybrid deep learning model aiming at promoter identification
title_fullStr A successful hybrid deep learning model aiming at promoter identification
title_full_unstemmed A successful hybrid deep learning model aiming at promoter identification
title_short A successful hybrid deep learning model aiming at promoter identification
title_sort successful hybrid deep learning model aiming at promoter identification
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9158169/
https://www.ncbi.nlm.nih.gov/pubmed/35641900
http://dx.doi.org/10.1186/s12859-022-04735-6
work_keys_str_mv AT wangying asuccessfulhybriddeeplearningmodelaimingatpromoteridentification
AT pengqinke asuccessfulhybriddeeplearningmodelaimingatpromoteridentification
AT mouxu asuccessfulhybriddeeplearningmodelaimingatpromoteridentification
AT wangxinyuan asuccessfulhybriddeeplearningmodelaimingatpromoteridentification
AT lihaozhou asuccessfulhybriddeeplearningmodelaimingatpromoteridentification
AT hantian asuccessfulhybriddeeplearningmodelaimingatpromoteridentification
AT sunzhao asuccessfulhybriddeeplearningmodelaimingatpromoteridentification
AT wangxiao asuccessfulhybriddeeplearningmodelaimingatpromoteridentification
AT wangying successfulhybriddeeplearningmodelaimingatpromoteridentification
AT pengqinke successfulhybriddeeplearningmodelaimingatpromoteridentification
AT mouxu successfulhybriddeeplearningmodelaimingatpromoteridentification
AT wangxinyuan successfulhybriddeeplearningmodelaimingatpromoteridentification
AT lihaozhou successfulhybriddeeplearningmodelaimingatpromoteridentification
AT hantian successfulhybriddeeplearningmodelaimingatpromoteridentification
AT sunzhao successfulhybriddeeplearningmodelaimingatpromoteridentification
AT wangxiao successfulhybriddeeplearningmodelaimingatpromoteridentification