Cargando…

Common and phylogenetically widespread coding for peptides by bacterial small RNAs

BACKGROUND: While eukaryotic noncoding RNAs have recently received intense scrutiny, it is becoming clear that bacterial transcription is at least as pervasive. Bacterial small RNAs and antisense RNAs (sRNAs) are often assumed to be noncoding, due to their lack of long open reading frames (ORFs). Ho...

Descripción completa

Detalles Bibliográficos
Autores principales: Friedman, Robin C., Kalkhof, Stefan, Doppelt-Azeroual, Olivia, Mueller, Stephan A., Chovancová, Martina, von Bergen, Martin, Schwikowski, Benno
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5521070/
https://www.ncbi.nlm.nih.gov/pubmed/28732463
http://dx.doi.org/10.1186/s12864-017-3932-y
_version_ 1783251911230619648
author Friedman, Robin C.
Kalkhof, Stefan
Doppelt-Azeroual, Olivia
Mueller, Stephan A.
Chovancová, Martina
von Bergen, Martin
Schwikowski, Benno
author_facet Friedman, Robin C.
Kalkhof, Stefan
Doppelt-Azeroual, Olivia
Mueller, Stephan A.
Chovancová, Martina
von Bergen, Martin
Schwikowski, Benno
author_sort Friedman, Robin C.
collection PubMed
description BACKGROUND: While eukaryotic noncoding RNAs have recently received intense scrutiny, it is becoming clear that bacterial transcription is at least as pervasive. Bacterial small RNAs and antisense RNAs (sRNAs) are often assumed to be noncoding, due to their lack of long open reading frames (ORFs). However, there are numerous examples of sRNAs encoding for small proteins, whether or not they also have a regulatory role at the RNA level. METHODS: Here, we apply flexible machine learning techniques based on sequence features and comparative genomics to quantify the prevalence of sRNA ORFs under natural selection to maintain protein-coding function in 14 phylogenetically diverse bacteria. Importantly, we quantify uncertainty in our predictions, and follow up on them using mass spectrometry proteomics and comparison to datasets including ribosome profiling. RESULTS: A majority of annotated sRNAs have at least one ORF between 10 and 50 amino acids long, and we conservatively predict that 409±191.7 unannotated sRNA ORFs are under selection to maintain coding (mean estimate and 95% confidence interval), an average of 29 per species considered here. This implies that overall at least 10.3±0.5% of sRNAs have a coding ORF, and in some species around 20% do. 165±69 of these novel coding ORFs have some antisense overlap to annotated ORFs. As experimental validation, many of our predictions are translated in published ribosome profiling data and are identified via mass spectrometry shotgun proteomics. B. subtilis sRNAs with coding ORFs are enriched for high expression in biofilms and confluent growth, and S. pneumoniae sRNAs with coding ORFs are involved in virulence. sRNA coding ORFs are enriched for transmembrane domains and many are predicted novel components of type I toxin/antitoxin systems. CONCLUSIONS: We predict over two dozen new protein-coding genes per bacterial species, but crucially also quantified the uncertainty in this estimate. Our predictions for sRNA coding ORFs, along with predicted novel type I toxins and tools for sorting and visualizing genomic context, are freely available in a user-friendly format at http://disco-bac.web.pasteur.fr. We expect these easily-accessible predictions to be a valuable tool for the study not only of bacterial sRNAs and type I toxin-antitoxin systems, but also of bacterial genetics and genomics. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-017-3932-y) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5521070
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-55210702017-07-21 Common and phylogenetically widespread coding for peptides by bacterial small RNAs Friedman, Robin C. Kalkhof, Stefan Doppelt-Azeroual, Olivia Mueller, Stephan A. Chovancová, Martina von Bergen, Martin Schwikowski, Benno BMC Genomics Methodology Article BACKGROUND: While eukaryotic noncoding RNAs have recently received intense scrutiny, it is becoming clear that bacterial transcription is at least as pervasive. Bacterial small RNAs and antisense RNAs (sRNAs) are often assumed to be noncoding, due to their lack of long open reading frames (ORFs). However, there are numerous examples of sRNAs encoding for small proteins, whether or not they also have a regulatory role at the RNA level. METHODS: Here, we apply flexible machine learning techniques based on sequence features and comparative genomics to quantify the prevalence of sRNA ORFs under natural selection to maintain protein-coding function in 14 phylogenetically diverse bacteria. Importantly, we quantify uncertainty in our predictions, and follow up on them using mass spectrometry proteomics and comparison to datasets including ribosome profiling. RESULTS: A majority of annotated sRNAs have at least one ORF between 10 and 50 amino acids long, and we conservatively predict that 409±191.7 unannotated sRNA ORFs are under selection to maintain coding (mean estimate and 95% confidence interval), an average of 29 per species considered here. This implies that overall at least 10.3±0.5% of sRNAs have a coding ORF, and in some species around 20% do. 165±69 of these novel coding ORFs have some antisense overlap to annotated ORFs. As experimental validation, many of our predictions are translated in published ribosome profiling data and are identified via mass spectrometry shotgun proteomics. B. subtilis sRNAs with coding ORFs are enriched for high expression in biofilms and confluent growth, and S. pneumoniae sRNAs with coding ORFs are involved in virulence. sRNA coding ORFs are enriched for transmembrane domains and many are predicted novel components of type I toxin/antitoxin systems. CONCLUSIONS: We predict over two dozen new protein-coding genes per bacterial species, but crucially also quantified the uncertainty in this estimate. Our predictions for sRNA coding ORFs, along with predicted novel type I toxins and tools for sorting and visualizing genomic context, are freely available in a user-friendly format at http://disco-bac.web.pasteur.fr. We expect these easily-accessible predictions to be a valuable tool for the study not only of bacterial sRNAs and type I toxin-antitoxin systems, but also of bacterial genetics and genomics. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-017-3932-y) contains supplementary material, which is available to authorized users. BioMed Central 2017-07-21 /pmc/articles/PMC5521070/ /pubmed/28732463 http://dx.doi.org/10.1186/s12864-017-3932-y Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Friedman, Robin C.
Kalkhof, Stefan
Doppelt-Azeroual, Olivia
Mueller, Stephan A.
Chovancová, Martina
von Bergen, Martin
Schwikowski, Benno
Common and phylogenetically widespread coding for peptides by bacterial small RNAs
title Common and phylogenetically widespread coding for peptides by bacterial small RNAs
title_full Common and phylogenetically widespread coding for peptides by bacterial small RNAs
title_fullStr Common and phylogenetically widespread coding for peptides by bacterial small RNAs
title_full_unstemmed Common and phylogenetically widespread coding for peptides by bacterial small RNAs
title_short Common and phylogenetically widespread coding for peptides by bacterial small RNAs
title_sort common and phylogenetically widespread coding for peptides by bacterial small rnas
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5521070/
https://www.ncbi.nlm.nih.gov/pubmed/28732463
http://dx.doi.org/10.1186/s12864-017-3932-y
work_keys_str_mv AT friedmanrobinc commonandphylogeneticallywidespreadcodingforpeptidesbybacterialsmallrnas
AT kalkhofstefan commonandphylogeneticallywidespreadcodingforpeptidesbybacterialsmallrnas
AT doppeltazeroualolivia commonandphylogeneticallywidespreadcodingforpeptidesbybacterialsmallrnas
AT muellerstephana commonandphylogeneticallywidespreadcodingforpeptidesbybacterialsmallrnas
AT chovancovamartina commonandphylogeneticallywidespreadcodingforpeptidesbybacterialsmallrnas
AT vonbergenmartin commonandphylogeneticallywidespreadcodingforpeptidesbybacterialsmallrnas
AT schwikowskibenno commonandphylogeneticallywidespreadcodingforpeptidesbybacterialsmallrnas