Cargando…

iProEP: A Computational Predictor for Predicting Promoter

Promoter is a fundamental DNA element located around the transcription start site (TSS) and could regulate gene transcription. Promoter recognition is of great significance in determining transcription units, studying gene structure, analyzing gene regulation mechanisms, and annotating gene function...

Descripción completa

Detalles Bibliográficos
Autores principales: Lai, Hong-Yan, Zhang, Zhao-Yue, Su, Zhen-Dong, Su, Wei, Ding, Hui, Chen, Wei, Lin, Hao
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Society of Gene & Cell Therapy 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6616480/
https://www.ncbi.nlm.nih.gov/pubmed/31299595
http://dx.doi.org/10.1016/j.omtn.2019.05.028
_version_ 1783433518120960000
author Lai, Hong-Yan
Zhang, Zhao-Yue
Su, Zhen-Dong
Su, Wei
Ding, Hui
Chen, Wei
Lin, Hao
author_facet Lai, Hong-Yan
Zhang, Zhao-Yue
Su, Zhen-Dong
Su, Wei
Ding, Hui
Chen, Wei
Lin, Hao
author_sort Lai, Hong-Yan
collection PubMed
description Promoter is a fundamental DNA element located around the transcription start site (TSS) and could regulate gene transcription. Promoter recognition is of great significance in determining transcription units, studying gene structure, analyzing gene regulation mechanisms, and annotating gene functional information. Many models have already been proposed to predict promoters. However, the performances of these methods still need to be improved. In this work, we combined pseudo k-tuple nucleotide composition (PseKNC) with position-correlation scoring function (PCSF) to formulate promoter sequences of Homo sapiens (H. sapiens), Drosophila melanogaster (D. melanogaster), Caenorhabditis elegans (C. elegans), Bacillus subtilis (B. subtilis), and Escherichia coli (E. coli). Minimum Redundancy Maximum Relevance (mRMR) algorithm and increment feature selection strategy were then adopted to find out optimal feature subsets. Support vector machine (SVM) was used to distinguish between promoters and non-promoters. In the 10-fold cross-validation test, accuracies of 93.3%, 93.9%, 95.7%, 95.2%, and 93.1% were obtained for H. sapiens, D. melanogaster, C. elegans, B. subtilis, and E. coli, with the areas under receiver operating curves (AUCs) of 0.974, 0.975, 0.981, 0.988, and 0.976, respectively. Comparative results demonstrated that our method outperforms existing methods for identifying promoters. An online web server was established that can be freely accessed (http://lin-group.cn/server/iProEP/).
format Online
Article
Text
id pubmed-6616480
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher American Society of Gene & Cell Therapy
record_format MEDLINE/PubMed
spelling pubmed-66164802019-07-22 iProEP: A Computational Predictor for Predicting Promoter Lai, Hong-Yan Zhang, Zhao-Yue Su, Zhen-Dong Su, Wei Ding, Hui Chen, Wei Lin, Hao Mol Ther Nucleic Acids Article Promoter is a fundamental DNA element located around the transcription start site (TSS) and could regulate gene transcription. Promoter recognition is of great significance in determining transcription units, studying gene structure, analyzing gene regulation mechanisms, and annotating gene functional information. Many models have already been proposed to predict promoters. However, the performances of these methods still need to be improved. In this work, we combined pseudo k-tuple nucleotide composition (PseKNC) with position-correlation scoring function (PCSF) to formulate promoter sequences of Homo sapiens (H. sapiens), Drosophila melanogaster (D. melanogaster), Caenorhabditis elegans (C. elegans), Bacillus subtilis (B. subtilis), and Escherichia coli (E. coli). Minimum Redundancy Maximum Relevance (mRMR) algorithm and increment feature selection strategy were then adopted to find out optimal feature subsets. Support vector machine (SVM) was used to distinguish between promoters and non-promoters. In the 10-fold cross-validation test, accuracies of 93.3%, 93.9%, 95.7%, 95.2%, and 93.1% were obtained for H. sapiens, D. melanogaster, C. elegans, B. subtilis, and E. coli, with the areas under receiver operating curves (AUCs) of 0.974, 0.975, 0.981, 0.988, and 0.976, respectively. Comparative results demonstrated that our method outperforms existing methods for identifying promoters. An online web server was established that can be freely accessed (http://lin-group.cn/server/iProEP/). American Society of Gene & Cell Therapy 2019-06-13 /pmc/articles/PMC6616480/ /pubmed/31299595 http://dx.doi.org/10.1016/j.omtn.2019.05.028 Text en © 2019 The Author(s) http://creativecommons.org/licenses/by-nc-nd/4.0/ This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Article
Lai, Hong-Yan
Zhang, Zhao-Yue
Su, Zhen-Dong
Su, Wei
Ding, Hui
Chen, Wei
Lin, Hao
iProEP: A Computational Predictor for Predicting Promoter
title iProEP: A Computational Predictor for Predicting Promoter
title_full iProEP: A Computational Predictor for Predicting Promoter
title_fullStr iProEP: A Computational Predictor for Predicting Promoter
title_full_unstemmed iProEP: A Computational Predictor for Predicting Promoter
title_short iProEP: A Computational Predictor for Predicting Promoter
title_sort iproep: a computational predictor for predicting promoter
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6616480/
https://www.ncbi.nlm.nih.gov/pubmed/31299595
http://dx.doi.org/10.1016/j.omtn.2019.05.028
work_keys_str_mv AT laihongyan iproepacomputationalpredictorforpredictingpromoter
AT zhangzhaoyue iproepacomputationalpredictorforpredictingpromoter
AT suzhendong iproepacomputationalpredictorforpredictingpromoter
AT suwei iproepacomputationalpredictorforpredictingpromoter
AT dinghui iproepacomputationalpredictorforpredictingpromoter
AT chenwei iproepacomputationalpredictorforpredictingpromoter
AT linhao iproepacomputationalpredictorforpredictingpromoter