Cargando…
iProEP: A Computational Predictor for Predicting Promoter
Promoter is a fundamental DNA element located around the transcription start site (TSS) and could regulate gene transcription. Promoter recognition is of great significance in determining transcription units, studying gene structure, analyzing gene regulation mechanisms, and annotating gene function...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Society of Gene & Cell Therapy
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6616480/ https://www.ncbi.nlm.nih.gov/pubmed/31299595 http://dx.doi.org/10.1016/j.omtn.2019.05.028 |
_version_ | 1783433518120960000 |
---|---|
author | Lai, Hong-Yan Zhang, Zhao-Yue Su, Zhen-Dong Su, Wei Ding, Hui Chen, Wei Lin, Hao |
author_facet | Lai, Hong-Yan Zhang, Zhao-Yue Su, Zhen-Dong Su, Wei Ding, Hui Chen, Wei Lin, Hao |
author_sort | Lai, Hong-Yan |
collection | PubMed |
description | Promoter is a fundamental DNA element located around the transcription start site (TSS) and could regulate gene transcription. Promoter recognition is of great significance in determining transcription units, studying gene structure, analyzing gene regulation mechanisms, and annotating gene functional information. Many models have already been proposed to predict promoters. However, the performances of these methods still need to be improved. In this work, we combined pseudo k-tuple nucleotide composition (PseKNC) with position-correlation scoring function (PCSF) to formulate promoter sequences of Homo sapiens (H. sapiens), Drosophila melanogaster (D. melanogaster), Caenorhabditis elegans (C. elegans), Bacillus subtilis (B. subtilis), and Escherichia coli (E. coli). Minimum Redundancy Maximum Relevance (mRMR) algorithm and increment feature selection strategy were then adopted to find out optimal feature subsets. Support vector machine (SVM) was used to distinguish between promoters and non-promoters. In the 10-fold cross-validation test, accuracies of 93.3%, 93.9%, 95.7%, 95.2%, and 93.1% were obtained for H. sapiens, D. melanogaster, C. elegans, B. subtilis, and E. coli, with the areas under receiver operating curves (AUCs) of 0.974, 0.975, 0.981, 0.988, and 0.976, respectively. Comparative results demonstrated that our method outperforms existing methods for identifying promoters. An online web server was established that can be freely accessed (http://lin-group.cn/server/iProEP/). |
format | Online Article Text |
id | pubmed-6616480 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | American Society of Gene & Cell Therapy |
record_format | MEDLINE/PubMed |
spelling | pubmed-66164802019-07-22 iProEP: A Computational Predictor for Predicting Promoter Lai, Hong-Yan Zhang, Zhao-Yue Su, Zhen-Dong Su, Wei Ding, Hui Chen, Wei Lin, Hao Mol Ther Nucleic Acids Article Promoter is a fundamental DNA element located around the transcription start site (TSS) and could regulate gene transcription. Promoter recognition is of great significance in determining transcription units, studying gene structure, analyzing gene regulation mechanisms, and annotating gene functional information. Many models have already been proposed to predict promoters. However, the performances of these methods still need to be improved. In this work, we combined pseudo k-tuple nucleotide composition (PseKNC) with position-correlation scoring function (PCSF) to formulate promoter sequences of Homo sapiens (H. sapiens), Drosophila melanogaster (D. melanogaster), Caenorhabditis elegans (C. elegans), Bacillus subtilis (B. subtilis), and Escherichia coli (E. coli). Minimum Redundancy Maximum Relevance (mRMR) algorithm and increment feature selection strategy were then adopted to find out optimal feature subsets. Support vector machine (SVM) was used to distinguish between promoters and non-promoters. In the 10-fold cross-validation test, accuracies of 93.3%, 93.9%, 95.7%, 95.2%, and 93.1% were obtained for H. sapiens, D. melanogaster, C. elegans, B. subtilis, and E. coli, with the areas under receiver operating curves (AUCs) of 0.974, 0.975, 0.981, 0.988, and 0.976, respectively. Comparative results demonstrated that our method outperforms existing methods for identifying promoters. An online web server was established that can be freely accessed (http://lin-group.cn/server/iProEP/). American Society of Gene & Cell Therapy 2019-06-13 /pmc/articles/PMC6616480/ /pubmed/31299595 http://dx.doi.org/10.1016/j.omtn.2019.05.028 Text en © 2019 The Author(s) http://creativecommons.org/licenses/by-nc-nd/4.0/ This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/). |
spellingShingle | Article Lai, Hong-Yan Zhang, Zhao-Yue Su, Zhen-Dong Su, Wei Ding, Hui Chen, Wei Lin, Hao iProEP: A Computational Predictor for Predicting Promoter |
title | iProEP: A Computational Predictor for Predicting Promoter |
title_full | iProEP: A Computational Predictor for Predicting Promoter |
title_fullStr | iProEP: A Computational Predictor for Predicting Promoter |
title_full_unstemmed | iProEP: A Computational Predictor for Predicting Promoter |
title_short | iProEP: A Computational Predictor for Predicting Promoter |
title_sort | iproep: a computational predictor for predicting promoter |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6616480/ https://www.ncbi.nlm.nih.gov/pubmed/31299595 http://dx.doi.org/10.1016/j.omtn.2019.05.028 |
work_keys_str_mv | AT laihongyan iproepacomputationalpredictorforpredictingpromoter AT zhangzhaoyue iproepacomputationalpredictorforpredictingpromoter AT suzhendong iproepacomputationalpredictorforpredictingpromoter AT suwei iproepacomputationalpredictorforpredictingpromoter AT dinghui iproepacomputationalpredictorforpredictingpromoter AT chenwei iproepacomputationalpredictorforpredictingpromoter AT linhao iproepacomputationalpredictorforpredictingpromoter |