Cargando…
Complete genome sequence and annotation of the laboratory reference strain Shigella flexneri serotype 5a M90T and genome-wide transcriptional start site determination
BACKGROUND: Shigella is a Gram-negative facultative intracellular bacterium that causes bacillary dysentery in humans. Shigella invades cells of the colonic mucosa owing to its virulence plasmid-encoded Type 3 Secretion System (T3SS), and multiplies in the target cell cytosol. Although the laborator...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7132871/ https://www.ncbi.nlm.nih.gov/pubmed/32252626 http://dx.doi.org/10.1186/s12864-020-6565-5 |
_version_ | 1783517521663492096 |
---|---|
author | Cervantes-Rivera, Ramón Tronnet, Sophie Puhar, Andrea |
author_facet | Cervantes-Rivera, Ramón Tronnet, Sophie Puhar, Andrea |
author_sort | Cervantes-Rivera, Ramón |
collection | PubMed |
description | BACKGROUND: Shigella is a Gram-negative facultative intracellular bacterium that causes bacillary dysentery in humans. Shigella invades cells of the colonic mucosa owing to its virulence plasmid-encoded Type 3 Secretion System (T3SS), and multiplies in the target cell cytosol. Although the laboratory reference strain S. flexneri serotype 5a M90T has been extensively used to understand the molecular mechanisms of pathogenesis, its complete genome sequence is not available, thereby greatly limiting studies employing high-throughput sequencing and systems biology approaches. RESULTS: We have sequenced, assembled, annotated and manually curated the full genome of S. flexneri 5a M90T. This yielded two complete circular contigs, the chromosome and the virulence plasmid (pWR100). To obtain the genome sequence, we have employed long-read PacBio DNA sequencing followed by polishing with Illumina RNA-seq data. This provides a new hybrid strategy to prepare gapless, highly accurate genome sequences, which also cover AT-rich tracks or repetitive sequences that are transcribed. Furthermore, we have performed genome-wide analysis of transcriptional start sites (TSS) and determined the length of 5′ untranslated regions (5′-UTRs) at typical culture conditions for the inoculum of in vitro infection experiments. We identified 6723 primary TSS (pTSS) and 7328 secondary TSS (sTSS). The S. flexneri 5a M90T annotated genome sequence and the transcriptional start sites are integrated into RegulonDB (http://regulondb.ccg.unam.mx) and RSAT (http://embnet.ccg.unam.mx/rsat/) databases to use their analysis tools in the S. flexneri 5a M90T genome. CONCLUSIONS: We provide the first complete genome for S. flexneri serotype 5a, specifically the laboratory reference strain M90T. Our work opens the possibility of employing S. flexneri M90T in high-quality systems biology studies such as transcriptomic and differential expression analyses or in genome evolution studies. Moreover, the catalogue of TSS that we report here can be used in molecular pathogenesis studies as a resource to know which genes are transcribed before infection of host cells. The genome sequence, together with the analysis of transcriptional start sites, is also a valuable tool for precise genetic manipulation of S. flexneri 5a M90T. Further, we present a new hybrid strategy to prepare gapless, highly accurate genome sequences. Unlike currently used hybrid strategies combining long- and short-read DNA sequencing technologies to maximize accuracy, our workflow using long-read DNA sequencing and short-read RNA sequencing provides the added value of using non-redundant technologies, which yield distinct, exploitable datasets. |
format | Online Article Text |
id | pubmed-7132871 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-71328712020-04-11 Complete genome sequence and annotation of the laboratory reference strain Shigella flexneri serotype 5a M90T and genome-wide transcriptional start site determination Cervantes-Rivera, Ramón Tronnet, Sophie Puhar, Andrea BMC Genomics Research Article BACKGROUND: Shigella is a Gram-negative facultative intracellular bacterium that causes bacillary dysentery in humans. Shigella invades cells of the colonic mucosa owing to its virulence plasmid-encoded Type 3 Secretion System (T3SS), and multiplies in the target cell cytosol. Although the laboratory reference strain S. flexneri serotype 5a M90T has been extensively used to understand the molecular mechanisms of pathogenesis, its complete genome sequence is not available, thereby greatly limiting studies employing high-throughput sequencing and systems biology approaches. RESULTS: We have sequenced, assembled, annotated and manually curated the full genome of S. flexneri 5a M90T. This yielded two complete circular contigs, the chromosome and the virulence plasmid (pWR100). To obtain the genome sequence, we have employed long-read PacBio DNA sequencing followed by polishing with Illumina RNA-seq data. This provides a new hybrid strategy to prepare gapless, highly accurate genome sequences, which also cover AT-rich tracks or repetitive sequences that are transcribed. Furthermore, we have performed genome-wide analysis of transcriptional start sites (TSS) and determined the length of 5′ untranslated regions (5′-UTRs) at typical culture conditions for the inoculum of in vitro infection experiments. We identified 6723 primary TSS (pTSS) and 7328 secondary TSS (sTSS). The S. flexneri 5a M90T annotated genome sequence and the transcriptional start sites are integrated into RegulonDB (http://regulondb.ccg.unam.mx) and RSAT (http://embnet.ccg.unam.mx/rsat/) databases to use their analysis tools in the S. flexneri 5a M90T genome. CONCLUSIONS: We provide the first complete genome for S. flexneri serotype 5a, specifically the laboratory reference strain M90T. Our work opens the possibility of employing S. flexneri M90T in high-quality systems biology studies such as transcriptomic and differential expression analyses or in genome evolution studies. Moreover, the catalogue of TSS that we report here can be used in molecular pathogenesis studies as a resource to know which genes are transcribed before infection of host cells. The genome sequence, together with the analysis of transcriptional start sites, is also a valuable tool for precise genetic manipulation of S. flexneri 5a M90T. Further, we present a new hybrid strategy to prepare gapless, highly accurate genome sequences. Unlike currently used hybrid strategies combining long- and short-read DNA sequencing technologies to maximize accuracy, our workflow using long-read DNA sequencing and short-read RNA sequencing provides the added value of using non-redundant technologies, which yield distinct, exploitable datasets. BioMed Central 2020-04-06 /pmc/articles/PMC7132871/ /pubmed/32252626 http://dx.doi.org/10.1186/s12864-020-6565-5 Text en © The Author(s). 2020 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Cervantes-Rivera, Ramón Tronnet, Sophie Puhar, Andrea Complete genome sequence and annotation of the laboratory reference strain Shigella flexneri serotype 5a M90T and genome-wide transcriptional start site determination |
title | Complete genome sequence and annotation of the laboratory reference strain Shigella flexneri serotype 5a M90T and genome-wide transcriptional start site determination |
title_full | Complete genome sequence and annotation of the laboratory reference strain Shigella flexneri serotype 5a M90T and genome-wide transcriptional start site determination |
title_fullStr | Complete genome sequence and annotation of the laboratory reference strain Shigella flexneri serotype 5a M90T and genome-wide transcriptional start site determination |
title_full_unstemmed | Complete genome sequence and annotation of the laboratory reference strain Shigella flexneri serotype 5a M90T and genome-wide transcriptional start site determination |
title_short | Complete genome sequence and annotation of the laboratory reference strain Shigella flexneri serotype 5a M90T and genome-wide transcriptional start site determination |
title_sort | complete genome sequence and annotation of the laboratory reference strain shigella flexneri serotype 5a m90t and genome-wide transcriptional start site determination |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7132871/ https://www.ncbi.nlm.nih.gov/pubmed/32252626 http://dx.doi.org/10.1186/s12864-020-6565-5 |
work_keys_str_mv | AT cervantesriveraramon completegenomesequenceandannotationofthelaboratoryreferencestrainshigellaflexneriserotype5am90tandgenomewidetranscriptionalstartsitedetermination AT tronnetsophie completegenomesequenceandannotationofthelaboratoryreferencestrainshigellaflexneriserotype5am90tandgenomewidetranscriptionalstartsitedetermination AT puharandrea completegenomesequenceandannotationofthelaboratoryreferencestrainshigellaflexneriserotype5am90tandgenomewidetranscriptionalstartsitedetermination |