Cargando…
gammaBOriS: Identification and Taxonomic Classification of Origins of Replication in Gammaproteobacteria using Motif-based Machine Learning
The biology of bacterial cells is, in general, based on information encoded on circular chromosomes. Regulation of chromosome replication is an essential process that mostly takes place at the origin of replication (oriC), a locus unique per chromosome. Identification of high numbers of oriC is a pr...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Nature Publishing Group UK
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7174414/ https://www.ncbi.nlm.nih.gov/pubmed/32317695 http://dx.doi.org/10.1038/s41598-020-63424-7 |
_version_ | 1783524633086001152 |
---|---|
author | Sperlea, Theodor Muth, Lea Martin, Roman Weigel, Christoph Waldminghaus, Torsten Heider, Dominik |
author_facet | Sperlea, Theodor Muth, Lea Martin, Roman Weigel, Christoph Waldminghaus, Torsten Heider, Dominik |
author_sort | Sperlea, Theodor |
collection | PubMed |
description | The biology of bacterial cells is, in general, based on information encoded on circular chromosomes. Regulation of chromosome replication is an essential process that mostly takes place at the origin of replication (oriC), a locus unique per chromosome. Identification of high numbers of oriC is a prerequisite for systematic studies that could lead to insights into oriC functioning as well as the identification of novel drug targets for antibiotic development. Current methods for identifying oriC sequences rely on chromosome-wide nucleotide disparities and are therefore limited to fully sequenced genomes, leaving a large number of genomic fragments unstudied. Here, we present gammaBOriS (Gammaproteobacterial oriC Searcher), which identifies oriC sequences on gammaproteobacterial chromosomal fragments. It does so by employing motif-based machine learning methods. Using gammaBOriS, we created BOriS DB, which currently contains 25,827 gammaproteobacterial oriC sequences from 1,217 species, thus making it the largest available database for oriC sequences to date. Furthermore, we present gammaBOriTax, a machine-learning based approach for taxonomic classification of oriC sequences, which was trained on the sequences in BOriS DB. Finally, we extracted the motifs relevant for identification and classification decisions of the models. Our results suggest that machine learning sequence classification approaches can offer great support in functional motif identification. |
format | Online Article Text |
id | pubmed-7174414 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Nature Publishing Group UK |
record_format | MEDLINE/PubMed |
spelling | pubmed-71744142020-04-24 gammaBOriS: Identification and Taxonomic Classification of Origins of Replication in Gammaproteobacteria using Motif-based Machine Learning Sperlea, Theodor Muth, Lea Martin, Roman Weigel, Christoph Waldminghaus, Torsten Heider, Dominik Sci Rep Article The biology of bacterial cells is, in general, based on information encoded on circular chromosomes. Regulation of chromosome replication is an essential process that mostly takes place at the origin of replication (oriC), a locus unique per chromosome. Identification of high numbers of oriC is a prerequisite for systematic studies that could lead to insights into oriC functioning as well as the identification of novel drug targets for antibiotic development. Current methods for identifying oriC sequences rely on chromosome-wide nucleotide disparities and are therefore limited to fully sequenced genomes, leaving a large number of genomic fragments unstudied. Here, we present gammaBOriS (Gammaproteobacterial oriC Searcher), which identifies oriC sequences on gammaproteobacterial chromosomal fragments. It does so by employing motif-based machine learning methods. Using gammaBOriS, we created BOriS DB, which currently contains 25,827 gammaproteobacterial oriC sequences from 1,217 species, thus making it the largest available database for oriC sequences to date. Furthermore, we present gammaBOriTax, a machine-learning based approach for taxonomic classification of oriC sequences, which was trained on the sequences in BOriS DB. Finally, we extracted the motifs relevant for identification and classification decisions of the models. Our results suggest that machine learning sequence classification approaches can offer great support in functional motif identification. Nature Publishing Group UK 2020-04-21 /pmc/articles/PMC7174414/ /pubmed/32317695 http://dx.doi.org/10.1038/s41598-020-63424-7 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/. |
spellingShingle | Article Sperlea, Theodor Muth, Lea Martin, Roman Weigel, Christoph Waldminghaus, Torsten Heider, Dominik gammaBOriS: Identification and Taxonomic Classification of Origins of Replication in Gammaproteobacteria using Motif-based Machine Learning |
title | gammaBOriS: Identification and Taxonomic Classification of Origins of Replication in Gammaproteobacteria using Motif-based Machine Learning |
title_full | gammaBOriS: Identification and Taxonomic Classification of Origins of Replication in Gammaproteobacteria using Motif-based Machine Learning |
title_fullStr | gammaBOriS: Identification and Taxonomic Classification of Origins of Replication in Gammaproteobacteria using Motif-based Machine Learning |
title_full_unstemmed | gammaBOriS: Identification and Taxonomic Classification of Origins of Replication in Gammaproteobacteria using Motif-based Machine Learning |
title_short | gammaBOriS: Identification and Taxonomic Classification of Origins of Replication in Gammaproteobacteria using Motif-based Machine Learning |
title_sort | gammaboris: identification and taxonomic classification of origins of replication in gammaproteobacteria using motif-based machine learning |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7174414/ https://www.ncbi.nlm.nih.gov/pubmed/32317695 http://dx.doi.org/10.1038/s41598-020-63424-7 |
work_keys_str_mv | AT sperleatheodor gammaborisidentificationandtaxonomicclassificationoforiginsofreplicationingammaproteobacteriausingmotifbasedmachinelearning AT muthlea gammaborisidentificationandtaxonomicclassificationoforiginsofreplicationingammaproteobacteriausingmotifbasedmachinelearning AT martinroman gammaborisidentificationandtaxonomicclassificationoforiginsofreplicationingammaproteobacteriausingmotifbasedmachinelearning AT weigelchristoph gammaborisidentificationandtaxonomicclassificationoforiginsofreplicationingammaproteobacteriausingmotifbasedmachinelearning AT waldminghaustorsten gammaborisidentificationandtaxonomicclassificationoforiginsofreplicationingammaproteobacteriausingmotifbasedmachinelearning AT heiderdominik gammaborisidentificationandtaxonomicclassificationoforiginsofreplicationingammaproteobacteriausingmotifbasedmachinelearning |