Cargando…

Modeling leaderless transcription and atypical genes results in more accurate gene prediction in prokaryotes

In a conventional view of the prokaryotic genome organization, promoters precede operons and ribosome binding sites (RBSs) with Shine-Dalgarno consensus precede genes. However, recent experimental research suggesting a more diverse view motivated us to develop an algorithm with improved gene-finding...

Descripción completa

Detalles Bibliográficos
Autores principales: Lomsadze, Alexandre, Gemayel, Karl, Tang, Shiyuyun, Borodovsky, Mark
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6028130/
https://www.ncbi.nlm.nih.gov/pubmed/29773659
http://dx.doi.org/10.1101/gr.230615.117
_version_ 1783336718594736128
author Lomsadze, Alexandre
Gemayel, Karl
Tang, Shiyuyun
Borodovsky, Mark
author_facet Lomsadze, Alexandre
Gemayel, Karl
Tang, Shiyuyun
Borodovsky, Mark
author_sort Lomsadze, Alexandre
collection PubMed
description In a conventional view of the prokaryotic genome organization, promoters precede operons and ribosome binding sites (RBSs) with Shine-Dalgarno consensus precede genes. However, recent experimental research suggesting a more diverse view motivated us to develop an algorithm with improved gene-finding accuracy. We describe GeneMarkS-2, an ab initio algorithm that uses a model derived by self-training for finding species-specific (native) genes, along with an array of precomputed “heuristic” models designed to identify harder-to-detect genes (likely horizontally transferred). Importantly, we designed GeneMarkS-2 to identify several types of distinct sequence patterns (signals) involved in gene expression control, among them the patterns characteristic for leaderless transcription as well as noncanonical RBS patterns. To assess the accuracy of GeneMarkS-2, we used genes validated by COG (Clusters of Orthologous Groups) annotation, proteomics experiments, and N-terminal protein sequencing. We observed that GeneMarkS-2 performed better on average in all accuracy measures when compared with the current state-of-the-art gene prediction tools. Furthermore, the screening of ∼5000 representative prokaryotic genomes made by GeneMarkS-2 predicted frequent leaderless transcription in both archaea and bacteria. We also observed that the RBS sites in some species with leadered transcription did not necessarily exhibit the Shine-Dalgarno consensus. The modeling of different types of sequence motifs regulating gene expression prompted a division of prokaryotic genomes into five categories with distinct sequence patterns around the gene starts.
format Online
Article
Text
id pubmed-6028130
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Cold Spring Harbor Laboratory Press
record_format MEDLINE/PubMed
spelling pubmed-60281302019-01-01 Modeling leaderless transcription and atypical genes results in more accurate gene prediction in prokaryotes Lomsadze, Alexandre Gemayel, Karl Tang, Shiyuyun Borodovsky, Mark Genome Res Method In a conventional view of the prokaryotic genome organization, promoters precede operons and ribosome binding sites (RBSs) with Shine-Dalgarno consensus precede genes. However, recent experimental research suggesting a more diverse view motivated us to develop an algorithm with improved gene-finding accuracy. We describe GeneMarkS-2, an ab initio algorithm that uses a model derived by self-training for finding species-specific (native) genes, along with an array of precomputed “heuristic” models designed to identify harder-to-detect genes (likely horizontally transferred). Importantly, we designed GeneMarkS-2 to identify several types of distinct sequence patterns (signals) involved in gene expression control, among them the patterns characteristic for leaderless transcription as well as noncanonical RBS patterns. To assess the accuracy of GeneMarkS-2, we used genes validated by COG (Clusters of Orthologous Groups) annotation, proteomics experiments, and N-terminal protein sequencing. We observed that GeneMarkS-2 performed better on average in all accuracy measures when compared with the current state-of-the-art gene prediction tools. Furthermore, the screening of ∼5000 representative prokaryotic genomes made by GeneMarkS-2 predicted frequent leaderless transcription in both archaea and bacteria. We also observed that the RBS sites in some species with leadered transcription did not necessarily exhibit the Shine-Dalgarno consensus. The modeling of different types of sequence motifs regulating gene expression prompted a division of prokaryotic genomes into five categories with distinct sequence patterns around the gene starts. Cold Spring Harbor Laboratory Press 2018-07 /pmc/articles/PMC6028130/ /pubmed/29773659 http://dx.doi.org/10.1101/gr.230615.117 Text en © 2018 Lomsadze et al.; Published by Cold Spring Harbor Laboratory Press http://creativecommons.org/licenses/by-nc/4.0/ This article is distributed exclusively by Cold Spring Harbor Laboratory Press for the first six months after the full-issue publication date (see http://genome.cshlp.org/site/misc/terms.xhtml). After six months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.
spellingShingle Method
Lomsadze, Alexandre
Gemayel, Karl
Tang, Shiyuyun
Borodovsky, Mark
Modeling leaderless transcription and atypical genes results in more accurate gene prediction in prokaryotes
title Modeling leaderless transcription and atypical genes results in more accurate gene prediction in prokaryotes
title_full Modeling leaderless transcription and atypical genes results in more accurate gene prediction in prokaryotes
title_fullStr Modeling leaderless transcription and atypical genes results in more accurate gene prediction in prokaryotes
title_full_unstemmed Modeling leaderless transcription and atypical genes results in more accurate gene prediction in prokaryotes
title_short Modeling leaderless transcription and atypical genes results in more accurate gene prediction in prokaryotes
title_sort modeling leaderless transcription and atypical genes results in more accurate gene prediction in prokaryotes
topic Method
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6028130/
https://www.ncbi.nlm.nih.gov/pubmed/29773659
http://dx.doi.org/10.1101/gr.230615.117
work_keys_str_mv AT lomsadzealexandre modelingleaderlesstranscriptionandatypicalgenesresultsinmoreaccurategenepredictioninprokaryotes
AT gemayelkarl modelingleaderlesstranscriptionandatypicalgenesresultsinmoreaccurategenepredictioninprokaryotes
AT tangshiyuyun modelingleaderlesstranscriptionandatypicalgenesresultsinmoreaccurategenepredictioninprokaryotes
AT borodovskymark modelingleaderlesstranscriptionandatypicalgenesresultsinmoreaccurategenepredictioninprokaryotes