Cargando…

gammaBOriS: Identification and Taxonomic Classification of Origins of Replication in Gammaproteobacteria using Motif-based Machine Learning

The biology of bacterial cells is, in general, based on information encoded on circular chromosomes. Regulation of chromosome replication is an essential process that mostly takes place at the origin of replication (oriC), a locus unique per chromosome. Identification of high numbers of oriC is a pr...

Descripción completa

Detalles Bibliográficos
Autores principales: Sperlea, Theodor, Muth, Lea, Martin, Roman, Weigel, Christoph, Waldminghaus, Torsten, Heider, Dominik
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7174414/
https://www.ncbi.nlm.nih.gov/pubmed/32317695
http://dx.doi.org/10.1038/s41598-020-63424-7
_version_ 1783524633086001152
author Sperlea, Theodor
Muth, Lea
Martin, Roman
Weigel, Christoph
Waldminghaus, Torsten
Heider, Dominik
author_facet Sperlea, Theodor
Muth, Lea
Martin, Roman
Weigel, Christoph
Waldminghaus, Torsten
Heider, Dominik
author_sort Sperlea, Theodor
collection PubMed
description The biology of bacterial cells is, in general, based on information encoded on circular chromosomes. Regulation of chromosome replication is an essential process that mostly takes place at the origin of replication (oriC), a locus unique per chromosome. Identification of high numbers of oriC is a prerequisite for systematic studies that could lead to insights into oriC functioning as well as the identification of novel drug targets for antibiotic development. Current methods for identifying oriC sequences rely on chromosome-wide nucleotide disparities and are therefore limited to fully sequenced genomes, leaving a large number of genomic fragments unstudied. Here, we present gammaBOriS (Gammaproteobacterial oriC Searcher), which identifies oriC sequences on gammaproteobacterial chromosomal fragments. It does so by employing motif-based machine learning methods. Using gammaBOriS, we created BOriS DB, which currently contains 25,827 gammaproteobacterial oriC sequences from 1,217 species, thus making it the largest available database for oriC sequences to date. Furthermore, we present gammaBOriTax, a machine-learning based approach for taxonomic classification of oriC sequences, which was trained on the sequences in BOriS DB. Finally, we extracted the motifs relevant for identification and classification decisions of the models. Our results suggest that machine learning sequence classification approaches can offer great support in functional motif identification.
format Online
Article
Text
id pubmed-7174414
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-71744142020-04-24 gammaBOriS: Identification and Taxonomic Classification of Origins of Replication in Gammaproteobacteria using Motif-based Machine Learning Sperlea, Theodor Muth, Lea Martin, Roman Weigel, Christoph Waldminghaus, Torsten Heider, Dominik Sci Rep Article The biology of bacterial cells is, in general, based on information encoded on circular chromosomes. Regulation of chromosome replication is an essential process that mostly takes place at the origin of replication (oriC), a locus unique per chromosome. Identification of high numbers of oriC is a prerequisite for systematic studies that could lead to insights into oriC functioning as well as the identification of novel drug targets for antibiotic development. Current methods for identifying oriC sequences rely on chromosome-wide nucleotide disparities and are therefore limited to fully sequenced genomes, leaving a large number of genomic fragments unstudied. Here, we present gammaBOriS (Gammaproteobacterial oriC Searcher), which identifies oriC sequences on gammaproteobacterial chromosomal fragments. It does so by employing motif-based machine learning methods. Using gammaBOriS, we created BOriS DB, which currently contains 25,827 gammaproteobacterial oriC sequences from 1,217 species, thus making it the largest available database for oriC sequences to date. Furthermore, we present gammaBOriTax, a machine-learning based approach for taxonomic classification of oriC sequences, which was trained on the sequences in BOriS DB. Finally, we extracted the motifs relevant for identification and classification decisions of the models. Our results suggest that machine learning sequence classification approaches can offer great support in functional motif identification. Nature Publishing Group UK 2020-04-21 /pmc/articles/PMC7174414/ /pubmed/32317695 http://dx.doi.org/10.1038/s41598-020-63424-7 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Sperlea, Theodor
Muth, Lea
Martin, Roman
Weigel, Christoph
Waldminghaus, Torsten
Heider, Dominik
gammaBOriS: Identification and Taxonomic Classification of Origins of Replication in Gammaproteobacteria using Motif-based Machine Learning
title gammaBOriS: Identification and Taxonomic Classification of Origins of Replication in Gammaproteobacteria using Motif-based Machine Learning
title_full gammaBOriS: Identification and Taxonomic Classification of Origins of Replication in Gammaproteobacteria using Motif-based Machine Learning
title_fullStr gammaBOriS: Identification and Taxonomic Classification of Origins of Replication in Gammaproteobacteria using Motif-based Machine Learning
title_full_unstemmed gammaBOriS: Identification and Taxonomic Classification of Origins of Replication in Gammaproteobacteria using Motif-based Machine Learning
title_short gammaBOriS: Identification and Taxonomic Classification of Origins of Replication in Gammaproteobacteria using Motif-based Machine Learning
title_sort gammaboris: identification and taxonomic classification of origins of replication in gammaproteobacteria using motif-based machine learning
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7174414/
https://www.ncbi.nlm.nih.gov/pubmed/32317695
http://dx.doi.org/10.1038/s41598-020-63424-7
work_keys_str_mv AT sperleatheodor gammaborisidentificationandtaxonomicclassificationoforiginsofreplicationingammaproteobacteriausingmotifbasedmachinelearning
AT muthlea gammaborisidentificationandtaxonomicclassificationoforiginsofreplicationingammaproteobacteriausingmotifbasedmachinelearning
AT martinroman gammaborisidentificationandtaxonomicclassificationoforiginsofreplicationingammaproteobacteriausingmotifbasedmachinelearning
AT weigelchristoph gammaborisidentificationandtaxonomicclassificationoforiginsofreplicationingammaproteobacteriausingmotifbasedmachinelearning
AT waldminghaustorsten gammaborisidentificationandtaxonomicclassificationoforiginsofreplicationingammaproteobacteriausingmotifbasedmachinelearning
AT heiderdominik gammaborisidentificationandtaxonomicclassificationoforiginsofreplicationingammaproteobacteriausingmotifbasedmachinelearning