Cargando…

Computational prediction of promotors in Agrobacterium tumefaciens strain C58 by using the machine learning technique

Promotors are those genomic regions on the upstream of genes, which are bound by RNA polymerase for starting gene transcription. Because it is the most critical element of gene expression, the recognition of promoters is crucial to understand the regulation of gene expression. This study aimed to de...

Descripción completa

Detalles Bibliográficos
Autores principales: Zulfiqar, Hasan, Ahmed, Zahoor, Kissanga Grace-Mercure, Bakanina, Hassan, Farwa, Zhang, Zhao-Yue, Liu, Fen
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10133480/
https://www.ncbi.nlm.nih.gov/pubmed/37125199
http://dx.doi.org/10.3389/fmicb.2023.1170785
_version_ 1785031575094689792
author Zulfiqar, Hasan
Ahmed, Zahoor
Kissanga Grace-Mercure, Bakanina
Hassan, Farwa
Zhang, Zhao-Yue
Liu, Fen
author_facet Zulfiqar, Hasan
Ahmed, Zahoor
Kissanga Grace-Mercure, Bakanina
Hassan, Farwa
Zhang, Zhao-Yue
Liu, Fen
author_sort Zulfiqar, Hasan
collection PubMed
description Promotors are those genomic regions on the upstream of genes, which are bound by RNA polymerase for starting gene transcription. Because it is the most critical element of gene expression, the recognition of promoters is crucial to understand the regulation of gene expression. This study aimed to develop a machine learning-based model to predict promotors in Agrobacterium tumefaciens (A. tumefaciens) strain C58. In the model, promotor sequences were encoded by three different kinds of feature descriptors, namely, accumulated nucleotide frequency, k-mer nucleotide composition, and binary encodings. The obtained features were optimized by using correlation and the mRMR-based algorithm. These optimized features were inputted into a random forest (RF) classifier to discriminate promotor sequences from non-promotor sequences in A. tumefaciens strain C58. The examination of 10-fold cross-validation showed that the proposed model could yield an overall accuracy of 0.837. This model will provide help for the study of promoters in A. tumefaciens C58 strain.
format Online
Article
Text
id pubmed-10133480
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-101334802023-04-28 Computational prediction of promotors in Agrobacterium tumefaciens strain C58 by using the machine learning technique Zulfiqar, Hasan Ahmed, Zahoor Kissanga Grace-Mercure, Bakanina Hassan, Farwa Zhang, Zhao-Yue Liu, Fen Front Microbiol Microbiology Promotors are those genomic regions on the upstream of genes, which are bound by RNA polymerase for starting gene transcription. Because it is the most critical element of gene expression, the recognition of promoters is crucial to understand the regulation of gene expression. This study aimed to develop a machine learning-based model to predict promotors in Agrobacterium tumefaciens (A. tumefaciens) strain C58. In the model, promotor sequences were encoded by three different kinds of feature descriptors, namely, accumulated nucleotide frequency, k-mer nucleotide composition, and binary encodings. The obtained features were optimized by using correlation and the mRMR-based algorithm. These optimized features were inputted into a random forest (RF) classifier to discriminate promotor sequences from non-promotor sequences in A. tumefaciens strain C58. The examination of 10-fold cross-validation showed that the proposed model could yield an overall accuracy of 0.837. This model will provide help for the study of promoters in A. tumefaciens C58 strain. Frontiers Media S.A. 2023-04-13 /pmc/articles/PMC10133480/ /pubmed/37125199 http://dx.doi.org/10.3389/fmicb.2023.1170785 Text en Copyright © 2023 Zulfiqar, Ahmed, Kissanga Grace-Mercure, Hassan, Zhang and Liu. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Microbiology
Zulfiqar, Hasan
Ahmed, Zahoor
Kissanga Grace-Mercure, Bakanina
Hassan, Farwa
Zhang, Zhao-Yue
Liu, Fen
Computational prediction of promotors in Agrobacterium tumefaciens strain C58 by using the machine learning technique
title Computational prediction of promotors in Agrobacterium tumefaciens strain C58 by using the machine learning technique
title_full Computational prediction of promotors in Agrobacterium tumefaciens strain C58 by using the machine learning technique
title_fullStr Computational prediction of promotors in Agrobacterium tumefaciens strain C58 by using the machine learning technique
title_full_unstemmed Computational prediction of promotors in Agrobacterium tumefaciens strain C58 by using the machine learning technique
title_short Computational prediction of promotors in Agrobacterium tumefaciens strain C58 by using the machine learning technique
title_sort computational prediction of promotors in agrobacterium tumefaciens strain c58 by using the machine learning technique
topic Microbiology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10133480/
https://www.ncbi.nlm.nih.gov/pubmed/37125199
http://dx.doi.org/10.3389/fmicb.2023.1170785
work_keys_str_mv AT zulfiqarhasan computationalpredictionofpromotorsinagrobacteriumtumefaciensstrainc58byusingthemachinelearningtechnique
AT ahmedzahoor computationalpredictionofpromotorsinagrobacteriumtumefaciensstrainc58byusingthemachinelearningtechnique
AT kissangagracemercurebakanina computationalpredictionofpromotorsinagrobacteriumtumefaciensstrainc58byusingthemachinelearningtechnique
AT hassanfarwa computationalpredictionofpromotorsinagrobacteriumtumefaciensstrainc58byusingthemachinelearningtechnique
AT zhangzhaoyue computationalpredictionofpromotorsinagrobacteriumtumefaciensstrainc58byusingthemachinelearningtechnique
AT liufen computationalpredictionofpromotorsinagrobacteriumtumefaciensstrainc58byusingthemachinelearningtechnique