Cargando…
Computational identification of promoters in Klebsiella aerogenes by using support vector machine
Promoters are the basic functional cis-elements to which RNA polymerase binds to initiate the process of gene transcription. Comprehensive understanding gene expression and regulation depends on the precise identification of promoters, as they are the most important component of gene expression. Thi...
Autores principales: | , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Frontiers Media S.A.
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10215528/ https://www.ncbi.nlm.nih.gov/pubmed/37250059 http://dx.doi.org/10.3389/fmicb.2023.1200678 |
_version_ | 1785048084762329088 |
---|---|
author | Lin, Yan Sun, Meili Zhang, Junjie Li, Mingyan Yang, Keli Wu, Chengyan Zulfiqar, Hasan Lai, Hongyan |
author_facet | Lin, Yan Sun, Meili Zhang, Junjie Li, Mingyan Yang, Keli Wu, Chengyan Zulfiqar, Hasan Lai, Hongyan |
author_sort | Lin, Yan |
collection | PubMed |
description | Promoters are the basic functional cis-elements to which RNA polymerase binds to initiate the process of gene transcription. Comprehensive understanding gene expression and regulation depends on the precise identification of promoters, as they are the most important component of gene expression. This study aimed to develop a machine learning-based model to predict promoters in Klebsiella aerogenes (K. aerogenes). In the prediction model, the promoter sequences in K. aerogenes genome were encoded by pseudo k-tuple nucleotide composition (PseKNC) and position-correlation scoring function (PCSF). Numerical features were obtained and then optimized using mRMR by combining with support vector machine (SVM) and 5-fold cross-validation (CV). Subsequently, these optimized features were inputted into SVM-based classifier to discriminate promoter sequences from non-promoter sequences in K. aerogenes. Results of 10-fold CV showed that the model could yield the overall accuracy of 96.0% and the area under the ROC curve (AUC) of 0.990. We hope that this model will provide help for the study of promoter and gene regulation in K. aerogenes. |
format | Online Article Text |
id | pubmed-10215528 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Frontiers Media S.A. |
record_format | MEDLINE/PubMed |
spelling | pubmed-102155282023-05-27 Computational identification of promoters in Klebsiella aerogenes by using support vector machine Lin, Yan Sun, Meili Zhang, Junjie Li, Mingyan Yang, Keli Wu, Chengyan Zulfiqar, Hasan Lai, Hongyan Front Microbiol Microbiology Promoters are the basic functional cis-elements to which RNA polymerase binds to initiate the process of gene transcription. Comprehensive understanding gene expression and regulation depends on the precise identification of promoters, as they are the most important component of gene expression. This study aimed to develop a machine learning-based model to predict promoters in Klebsiella aerogenes (K. aerogenes). In the prediction model, the promoter sequences in K. aerogenes genome were encoded by pseudo k-tuple nucleotide composition (PseKNC) and position-correlation scoring function (PCSF). Numerical features were obtained and then optimized using mRMR by combining with support vector machine (SVM) and 5-fold cross-validation (CV). Subsequently, these optimized features were inputted into SVM-based classifier to discriminate promoter sequences from non-promoter sequences in K. aerogenes. Results of 10-fold CV showed that the model could yield the overall accuracy of 96.0% and the area under the ROC curve (AUC) of 0.990. We hope that this model will provide help for the study of promoter and gene regulation in K. aerogenes. Frontiers Media S.A. 2023-05-05 /pmc/articles/PMC10215528/ /pubmed/37250059 http://dx.doi.org/10.3389/fmicb.2023.1200678 Text en Copyright © 2023 Lin, Sun, Zhang, Li, Yang, Wu, Zulfiqar and Lai. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms. |
spellingShingle | Microbiology Lin, Yan Sun, Meili Zhang, Junjie Li, Mingyan Yang, Keli Wu, Chengyan Zulfiqar, Hasan Lai, Hongyan Computational identification of promoters in Klebsiella aerogenes by using support vector machine |
title | Computational identification of promoters in Klebsiella aerogenes by using support vector machine |
title_full | Computational identification of promoters in Klebsiella aerogenes by using support vector machine |
title_fullStr | Computational identification of promoters in Klebsiella aerogenes by using support vector machine |
title_full_unstemmed | Computational identification of promoters in Klebsiella aerogenes by using support vector machine |
title_short | Computational identification of promoters in Klebsiella aerogenes by using support vector machine |
title_sort | computational identification of promoters in klebsiella aerogenes by using support vector machine |
topic | Microbiology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10215528/ https://www.ncbi.nlm.nih.gov/pubmed/37250059 http://dx.doi.org/10.3389/fmicb.2023.1200678 |
work_keys_str_mv | AT linyan computationalidentificationofpromotersinklebsiellaaerogenesbyusingsupportvectormachine AT sunmeili computationalidentificationofpromotersinklebsiellaaerogenesbyusingsupportvectormachine AT zhangjunjie computationalidentificationofpromotersinklebsiellaaerogenesbyusingsupportvectormachine AT limingyan computationalidentificationofpromotersinklebsiellaaerogenesbyusingsupportvectormachine AT yangkeli computationalidentificationofpromotersinklebsiellaaerogenesbyusingsupportvectormachine AT wuchengyan computationalidentificationofpromotersinklebsiellaaerogenesbyusingsupportvectormachine AT zulfiqarhasan computationalidentificationofpromotersinklebsiellaaerogenesbyusingsupportvectormachine AT laihongyan computationalidentificationofpromotersinklebsiellaaerogenesbyusingsupportvectormachine |