Cargando…

Discovery of numerous novel small genes in the intergenic regions of the Escherichia coli O157:H7 Sakai genome

In the past, short protein-coding genes were often disregarded by genome annotation pipelines. Transcriptome sequencing (RNAseq) signals outside of annotated genes have usually been interpreted to indicate either ncRNA or pervasive transcription. Therefore, in addition to the transcriptome, the tran...

Descripción completa

Detalles Bibliográficos
Autores principales: Hücker, Sarah M., Ardern, Zachary, Goldberg, Tatyana, Schafferhans, Andrea, Bernhofer, Michael, Vestergaard, Gisle, Nelson, Chase W., Schloter, Michael, Rost, Burkhard, Scherer, Siegfried, Neuhaus, Klaus
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5597208/
https://www.ncbi.nlm.nih.gov/pubmed/28902868
http://dx.doi.org/10.1371/journal.pone.0184119
_version_ 1783263670385508352
author Hücker, Sarah M.
Ardern, Zachary
Goldberg, Tatyana
Schafferhans, Andrea
Bernhofer, Michael
Vestergaard, Gisle
Nelson, Chase W.
Schloter, Michael
Rost, Burkhard
Scherer, Siegfried
Neuhaus, Klaus
author_facet Hücker, Sarah M.
Ardern, Zachary
Goldberg, Tatyana
Schafferhans, Andrea
Bernhofer, Michael
Vestergaard, Gisle
Nelson, Chase W.
Schloter, Michael
Rost, Burkhard
Scherer, Siegfried
Neuhaus, Klaus
author_sort Hücker, Sarah M.
collection PubMed
description In the past, short protein-coding genes were often disregarded by genome annotation pipelines. Transcriptome sequencing (RNAseq) signals outside of annotated genes have usually been interpreted to indicate either ncRNA or pervasive transcription. Therefore, in addition to the transcriptome, the translatome (RIBOseq) of the enteric pathogen Escherichia coli O157:H7 strain Sakai was determined at two optimal growth conditions and a severe stress condition combining low temperature and high osmotic pressure. All intergenic open reading frames potentially encoding a protein of ≥ 30 amino acids were investigated with regard to coverage by transcription and translation signals and their translatability expressed by the ribosomal coverage value. This led to discovery of 465 unique, putative novel genes not yet annotated in this E. coli strain, which are evenly distributed over both DNA strands of the genome. For 255 of the novel genes, annotated homologs in other bacteria were found, and a machine-learning algorithm, trained on small protein-coding E. coli genes, predicted that 89% of these translated open reading frames represent bona fide genes. The remaining 210 putative novel genes without annotated homologs were compared to the 255 novel genes with homologs and to 250 short annotated genes of this E. coli strain. All three groups turned out to be similar with respect to their translatability distribution, fractions of differentially regulated genes, secondary structure composition, and the distribution of evolutionary constraint, suggesting that both novel groups represent legitimate genes. However, the machine-learning algorithm only recognized a small fraction of the 210 genes without annotated homologs. It is possible that these genes represent a novel group of genes, which have unusual features dissimilar to the genes of the machine-learning algorithm training set.
format Online
Article
Text
id pubmed-5597208
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-55972082017-09-15 Discovery of numerous novel small genes in the intergenic regions of the Escherichia coli O157:H7 Sakai genome Hücker, Sarah M. Ardern, Zachary Goldberg, Tatyana Schafferhans, Andrea Bernhofer, Michael Vestergaard, Gisle Nelson, Chase W. Schloter, Michael Rost, Burkhard Scherer, Siegfried Neuhaus, Klaus PLoS One Research Article In the past, short protein-coding genes were often disregarded by genome annotation pipelines. Transcriptome sequencing (RNAseq) signals outside of annotated genes have usually been interpreted to indicate either ncRNA or pervasive transcription. Therefore, in addition to the transcriptome, the translatome (RIBOseq) of the enteric pathogen Escherichia coli O157:H7 strain Sakai was determined at two optimal growth conditions and a severe stress condition combining low temperature and high osmotic pressure. All intergenic open reading frames potentially encoding a protein of ≥ 30 amino acids were investigated with regard to coverage by transcription and translation signals and their translatability expressed by the ribosomal coverage value. This led to discovery of 465 unique, putative novel genes not yet annotated in this E. coli strain, which are evenly distributed over both DNA strands of the genome. For 255 of the novel genes, annotated homologs in other bacteria were found, and a machine-learning algorithm, trained on small protein-coding E. coli genes, predicted that 89% of these translated open reading frames represent bona fide genes. The remaining 210 putative novel genes without annotated homologs were compared to the 255 novel genes with homologs and to 250 short annotated genes of this E. coli strain. All three groups turned out to be similar with respect to their translatability distribution, fractions of differentially regulated genes, secondary structure composition, and the distribution of evolutionary constraint, suggesting that both novel groups represent legitimate genes. However, the machine-learning algorithm only recognized a small fraction of the 210 genes without annotated homologs. It is possible that these genes represent a novel group of genes, which have unusual features dissimilar to the genes of the machine-learning algorithm training set. Public Library of Science 2017-09-13 /pmc/articles/PMC5597208/ /pubmed/28902868 http://dx.doi.org/10.1371/journal.pone.0184119 Text en © 2017 Hücker et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle Research Article
Hücker, Sarah M.
Ardern, Zachary
Goldberg, Tatyana
Schafferhans, Andrea
Bernhofer, Michael
Vestergaard, Gisle
Nelson, Chase W.
Schloter, Michael
Rost, Burkhard
Scherer, Siegfried
Neuhaus, Klaus
Discovery of numerous novel small genes in the intergenic regions of the Escherichia coli O157:H7 Sakai genome
title Discovery of numerous novel small genes in the intergenic regions of the Escherichia coli O157:H7 Sakai genome
title_full Discovery of numerous novel small genes in the intergenic regions of the Escherichia coli O157:H7 Sakai genome
title_fullStr Discovery of numerous novel small genes in the intergenic regions of the Escherichia coli O157:H7 Sakai genome
title_full_unstemmed Discovery of numerous novel small genes in the intergenic regions of the Escherichia coli O157:H7 Sakai genome
title_short Discovery of numerous novel small genes in the intergenic regions of the Escherichia coli O157:H7 Sakai genome
title_sort discovery of numerous novel small genes in the intergenic regions of the escherichia coli o157:h7 sakai genome
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5597208/
https://www.ncbi.nlm.nih.gov/pubmed/28902868
http://dx.doi.org/10.1371/journal.pone.0184119
work_keys_str_mv AT huckersarahm discoveryofnumerousnovelsmallgenesintheintergenicregionsoftheescherichiacolio157h7sakaigenome
AT ardernzachary discoveryofnumerousnovelsmallgenesintheintergenicregionsoftheescherichiacolio157h7sakaigenome
AT goldbergtatyana discoveryofnumerousnovelsmallgenesintheintergenicregionsoftheescherichiacolio157h7sakaigenome
AT schafferhansandrea discoveryofnumerousnovelsmallgenesintheintergenicregionsoftheescherichiacolio157h7sakaigenome
AT bernhofermichael discoveryofnumerousnovelsmallgenesintheintergenicregionsoftheescherichiacolio157h7sakaigenome
AT vestergaardgisle discoveryofnumerousnovelsmallgenesintheintergenicregionsoftheescherichiacolio157h7sakaigenome
AT nelsonchasew discoveryofnumerousnovelsmallgenesintheintergenicregionsoftheescherichiacolio157h7sakaigenome
AT schlotermichael discoveryofnumerousnovelsmallgenesintheintergenicregionsoftheescherichiacolio157h7sakaigenome
AT rostburkhard discoveryofnumerousnovelsmallgenesintheintergenicregionsoftheescherichiacolio157h7sakaigenome
AT scherersiegfried discoveryofnumerousnovelsmallgenesintheintergenicregionsoftheescherichiacolio157h7sakaigenome
AT neuhausklaus discoveryofnumerousnovelsmallgenesintheintergenicregionsoftheescherichiacolio157h7sakaigenome