Cargando…

Bacillus subtilis promoter sequences data set for promoter prediction in Gram-positive bacteria

This paper presents a prediction of Bacillus subtilis promoters using a Support Vector Machine system. In the literature, there is a lack of information on Gram-positive bacterial promoter sequences compared to Gram-negative bacteria. Promoter sequence identification is essential for studying gene e...

Descripción completa

Detalles Bibliográficos
Autores principales: Coelho, Rafael Vieira, de Avila e Silva, Scheila, Echeverrigaray, Sergio, Delamare, Ana Paula Longaray
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Elsevier 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5993011/
https://www.ncbi.nlm.nih.gov/pubmed/29892645
http://dx.doi.org/10.1016/j.dib.2018.05.025
_version_ 1783330154393632768
author Coelho, Rafael Vieira
de Avila e Silva, Scheila
Echeverrigaray, Sergio
Delamare, Ana Paula Longaray
author_facet Coelho, Rafael Vieira
de Avila e Silva, Scheila
Echeverrigaray, Sergio
Delamare, Ana Paula Longaray
author_sort Coelho, Rafael Vieira
collection PubMed
description This paper presents a prediction of Bacillus subtilis promoters using a Support Vector Machine system. In the literature, there is a lack of information on Gram-positive bacterial promoter sequences compared to Gram-negative bacteria. Promoter sequence identification is essential for studying gene expression. Initially, we collected the B. subtilis genome sequence from the NCBI database, and promoters were identified by their sigma factors in the DBTBS database. We then grouped the promoters according to 15 factors in 2 domains, corresponding to sigma 54 and sigma 70 of Gram-negative bacteria. Based on these data we developed a script in Python to search for promoters in the B. subtilis genome. After processing the data, we obtained 767 promoter sequences for B. subtilis, most of which were recognized by sigma SigA. To validate the data we found, we developed a software package called BacSVM+, which receives promoters as input and returns the best combination of parameters in a LibSVM library to predict promoter regions in the bacteria used in the simulation. All data gathered as well as the BacSVM+ software is available for download at http://bacpp.bioinfoucs.com/rafael/Sigmas.zip.
format Online
Article
Text
id pubmed-5993011
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Elsevier
record_format MEDLINE/PubMed
spelling pubmed-59930112018-06-11 Bacillus subtilis promoter sequences data set for promoter prediction in Gram-positive bacteria Coelho, Rafael Vieira de Avila e Silva, Scheila Echeverrigaray, Sergio Delamare, Ana Paula Longaray Data Brief Genetics, Genomics and Molecular Biology This paper presents a prediction of Bacillus subtilis promoters using a Support Vector Machine system. In the literature, there is a lack of information on Gram-positive bacterial promoter sequences compared to Gram-negative bacteria. Promoter sequence identification is essential for studying gene expression. Initially, we collected the B. subtilis genome sequence from the NCBI database, and promoters were identified by their sigma factors in the DBTBS database. We then grouped the promoters according to 15 factors in 2 domains, corresponding to sigma 54 and sigma 70 of Gram-negative bacteria. Based on these data we developed a script in Python to search for promoters in the B. subtilis genome. After processing the data, we obtained 767 promoter sequences for B. subtilis, most of which were recognized by sigma SigA. To validate the data we found, we developed a software package called BacSVM+, which receives promoters as input and returns the best combination of parameters in a LibSVM library to predict promoter regions in the bacteria used in the simulation. All data gathered as well as the BacSVM+ software is available for download at http://bacpp.bioinfoucs.com/rafael/Sigmas.zip. Elsevier 2018-05-13 /pmc/articles/PMC5993011/ /pubmed/29892645 http://dx.doi.org/10.1016/j.dib.2018.05.025 Text en © 2018 The Authors http://creativecommons.org/licenses/by/4.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Genetics, Genomics and Molecular Biology
Coelho, Rafael Vieira
de Avila e Silva, Scheila
Echeverrigaray, Sergio
Delamare, Ana Paula Longaray
Bacillus subtilis promoter sequences data set for promoter prediction in Gram-positive bacteria
title Bacillus subtilis promoter sequences data set for promoter prediction in Gram-positive bacteria
title_full Bacillus subtilis promoter sequences data set for promoter prediction in Gram-positive bacteria
title_fullStr Bacillus subtilis promoter sequences data set for promoter prediction in Gram-positive bacteria
title_full_unstemmed Bacillus subtilis promoter sequences data set for promoter prediction in Gram-positive bacteria
title_short Bacillus subtilis promoter sequences data set for promoter prediction in Gram-positive bacteria
title_sort bacillus subtilis promoter sequences data set for promoter prediction in gram-positive bacteria
topic Genetics, Genomics and Molecular Biology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5993011/
https://www.ncbi.nlm.nih.gov/pubmed/29892645
http://dx.doi.org/10.1016/j.dib.2018.05.025
work_keys_str_mv AT coelhorafaelvieira bacillussubtilispromotersequencesdatasetforpromoterpredictioningrampositivebacteria
AT deavilaesilvascheila bacillussubtilispromotersequencesdatasetforpromoterpredictioningrampositivebacteria
AT echeverrigaraysergio bacillussubtilispromotersequencesdatasetforpromoterpredictioningrampositivebacteria
AT delamareanapaulalongaray bacillussubtilispromotersequencesdatasetforpromoterpredictioningrampositivebacteria