Cargando…

Computational identification of promoters in Klebsiella aerogenes by using support vector machine

Promoters are the basic functional cis-elements to which RNA polymerase binds to initiate the process of gene transcription. Comprehensive understanding gene expression and regulation depends on the precise identification of promoters, as they are the most important component of gene expression. Thi...

Descripción completa

Detalles Bibliográficos
Autores principales: Lin, Yan, Sun, Meili, Zhang, Junjie, Li, Mingyan, Yang, Keli, Wu, Chengyan, Zulfiqar, Hasan, Lai, Hongyan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10215528/
https://www.ncbi.nlm.nih.gov/pubmed/37250059
http://dx.doi.org/10.3389/fmicb.2023.1200678
_version_ 1785048084762329088
author Lin, Yan
Sun, Meili
Zhang, Junjie
Li, Mingyan
Yang, Keli
Wu, Chengyan
Zulfiqar, Hasan
Lai, Hongyan
author_facet Lin, Yan
Sun, Meili
Zhang, Junjie
Li, Mingyan
Yang, Keli
Wu, Chengyan
Zulfiqar, Hasan
Lai, Hongyan
author_sort Lin, Yan
collection PubMed
description Promoters are the basic functional cis-elements to which RNA polymerase binds to initiate the process of gene transcription. Comprehensive understanding gene expression and regulation depends on the precise identification of promoters, as they are the most important component of gene expression. This study aimed to develop a machine learning-based model to predict promoters in Klebsiella aerogenes (K. aerogenes). In the prediction model, the promoter sequences in K. aerogenes genome were encoded by pseudo k-tuple nucleotide composition (PseKNC) and position-correlation scoring function (PCSF). Numerical features were obtained and then optimized using mRMR by combining with support vector machine (SVM) and 5-fold cross-validation (CV). Subsequently, these optimized features were inputted into SVM-based classifier to discriminate promoter sequences from non-promoter sequences in K. aerogenes. Results of 10-fold CV showed that the model could yield the overall accuracy of 96.0% and the area under the ROC curve (AUC) of 0.990. We hope that this model will provide help for the study of promoter and gene regulation in K. aerogenes.
format Online
Article
Text
id pubmed-10215528
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-102155282023-05-27 Computational identification of promoters in Klebsiella aerogenes by using support vector machine Lin, Yan Sun, Meili Zhang, Junjie Li, Mingyan Yang, Keli Wu, Chengyan Zulfiqar, Hasan Lai, Hongyan Front Microbiol Microbiology Promoters are the basic functional cis-elements to which RNA polymerase binds to initiate the process of gene transcription. Comprehensive understanding gene expression and regulation depends on the precise identification of promoters, as they are the most important component of gene expression. This study aimed to develop a machine learning-based model to predict promoters in Klebsiella aerogenes (K. aerogenes). In the prediction model, the promoter sequences in K. aerogenes genome were encoded by pseudo k-tuple nucleotide composition (PseKNC) and position-correlation scoring function (PCSF). Numerical features were obtained and then optimized using mRMR by combining with support vector machine (SVM) and 5-fold cross-validation (CV). Subsequently, these optimized features were inputted into SVM-based classifier to discriminate promoter sequences from non-promoter sequences in K. aerogenes. Results of 10-fold CV showed that the model could yield the overall accuracy of 96.0% and the area under the ROC curve (AUC) of 0.990. We hope that this model will provide help for the study of promoter and gene regulation in K. aerogenes. Frontiers Media S.A. 2023-05-05 /pmc/articles/PMC10215528/ /pubmed/37250059 http://dx.doi.org/10.3389/fmicb.2023.1200678 Text en Copyright © 2023 Lin, Sun, Zhang, Li, Yang, Wu, Zulfiqar and Lai. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Microbiology
Lin, Yan
Sun, Meili
Zhang, Junjie
Li, Mingyan
Yang, Keli
Wu, Chengyan
Zulfiqar, Hasan
Lai, Hongyan
Computational identification of promoters in Klebsiella aerogenes by using support vector machine
title Computational identification of promoters in Klebsiella aerogenes by using support vector machine
title_full Computational identification of promoters in Klebsiella aerogenes by using support vector machine
title_fullStr Computational identification of promoters in Klebsiella aerogenes by using support vector machine
title_full_unstemmed Computational identification of promoters in Klebsiella aerogenes by using support vector machine
title_short Computational identification of promoters in Klebsiella aerogenes by using support vector machine
title_sort computational identification of promoters in klebsiella aerogenes by using support vector machine
topic Microbiology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10215528/
https://www.ncbi.nlm.nih.gov/pubmed/37250059
http://dx.doi.org/10.3389/fmicb.2023.1200678
work_keys_str_mv AT linyan computationalidentificationofpromotersinklebsiellaaerogenesbyusingsupportvectormachine
AT sunmeili computationalidentificationofpromotersinklebsiellaaerogenesbyusingsupportvectormachine
AT zhangjunjie computationalidentificationofpromotersinklebsiellaaerogenesbyusingsupportvectormachine
AT limingyan computationalidentificationofpromotersinklebsiellaaerogenesbyusingsupportvectormachine
AT yangkeli computationalidentificationofpromotersinklebsiellaaerogenesbyusingsupportvectormachine
AT wuchengyan computationalidentificationofpromotersinklebsiellaaerogenesbyusingsupportvectormachine
AT zulfiqarhasan computationalidentificationofpromotersinklebsiellaaerogenesbyusingsupportvectormachine
AT laihongyan computationalidentificationofpromotersinklebsiellaaerogenesbyusingsupportvectormachine