Cargando…
Improved methods and resources for paramecium genomics: transcription units, gene annotation and gene expression
BACKGROUND: The 15 sibling species of the Paramecium aurelia cryptic species complex emerged after a whole genome duplication that occurred tens of millions of years ago. Given extensive knowledge of the genetics and epigenetics of Paramecium acquired over the last century, this species complex offe...
Autores principales: | , , , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5485702/ https://www.ncbi.nlm.nih.gov/pubmed/28651633 http://dx.doi.org/10.1186/s12864-017-3887-z |
_version_ | 1783246121849585664 |
---|---|
author | Arnaiz, Olivier Van Dijk, Erwin Bétermier, Mireille Lhuillier-Akakpo, Maoussi de Vanssay, Augustin Duharcourt, Sandra Sallet, Erika Gouzy, Jérôme Sperling, Linda |
author_facet | Arnaiz, Olivier Van Dijk, Erwin Bétermier, Mireille Lhuillier-Akakpo, Maoussi de Vanssay, Augustin Duharcourt, Sandra Sallet, Erika Gouzy, Jérôme Sperling, Linda |
author_sort | Arnaiz, Olivier |
collection | PubMed |
description | BACKGROUND: The 15 sibling species of the Paramecium aurelia cryptic species complex emerged after a whole genome duplication that occurred tens of millions of years ago. Given extensive knowledge of the genetics and epigenetics of Paramecium acquired over the last century, this species complex offers a uniquely powerful system to investigate the consequences of whole genome duplication in a unicellular eukaryote as well as the genetic and epigenetic mechanisms that drive speciation. High quality Paramecium gene models are important for research using this system. The major aim of the work reported here was to build an improved gene annotation pipeline for the Paramecium lineage. RESULTS: We generated oriented RNA-Seq transcriptome data across the sexual process of autogamy for the model species Paramecium tetraurelia. We determined, for the first time in a ciliate, candidate P. tetraurelia transcription start sites using an adapted Cap-Seq protocol. We developed TrUC, multi-threaded Perl software that in conjunction with TopHat mapping of RNA-Seq data to a reference genome, predicts transcription units for the annotation pipeline. We used EuGene software to combine annotation evidence. The high quality gene structural annotations obtained for P. tetraurelia were used as evidence to improve published annotations for 3 other Paramecium species. The RNA-Seq data were also used for differential gene expression analysis, providing a gene expression atlas that is more sensitive than the previously established microarray resource. CONCLUSIONS: We have developed a gene annotation pipeline tailored for the compact genomes and tiny introns of Paramecium species. A novel component of this pipeline, TrUC, predicts transcription units using Cap-Seq and oriented RNA-Seq data. TrUC could prove useful beyond Paramecium, especially in the case of high gene density. Accurate predictions of 3′ and 5′ UTR will be particularly valuable for studies of gene expression (e.g. nucleosome positioning, identification of cis regulatory motifs). The P. tetraurelia improved transcriptome resource, gene annotations for P. tetraurelia, P. biaurelia, P. sexaurelia and P. caudatum, and Paramecium-trained EuGene configuration are available through ParameciumDB (http://paramecium.i2bc.paris-saclay.fr). TrUC software is freely distributed under a GNU GPL v3 licence (https://github.com/oarnaiz/TrUC). ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-017-3887-z) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5485702 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-54857022017-06-30 Improved methods and resources for paramecium genomics: transcription units, gene annotation and gene expression Arnaiz, Olivier Van Dijk, Erwin Bétermier, Mireille Lhuillier-Akakpo, Maoussi de Vanssay, Augustin Duharcourt, Sandra Sallet, Erika Gouzy, Jérôme Sperling, Linda BMC Genomics Methodology Article BACKGROUND: The 15 sibling species of the Paramecium aurelia cryptic species complex emerged after a whole genome duplication that occurred tens of millions of years ago. Given extensive knowledge of the genetics and epigenetics of Paramecium acquired over the last century, this species complex offers a uniquely powerful system to investigate the consequences of whole genome duplication in a unicellular eukaryote as well as the genetic and epigenetic mechanisms that drive speciation. High quality Paramecium gene models are important for research using this system. The major aim of the work reported here was to build an improved gene annotation pipeline for the Paramecium lineage. RESULTS: We generated oriented RNA-Seq transcriptome data across the sexual process of autogamy for the model species Paramecium tetraurelia. We determined, for the first time in a ciliate, candidate P. tetraurelia transcription start sites using an adapted Cap-Seq protocol. We developed TrUC, multi-threaded Perl software that in conjunction with TopHat mapping of RNA-Seq data to a reference genome, predicts transcription units for the annotation pipeline. We used EuGene software to combine annotation evidence. The high quality gene structural annotations obtained for P. tetraurelia were used as evidence to improve published annotations for 3 other Paramecium species. The RNA-Seq data were also used for differential gene expression analysis, providing a gene expression atlas that is more sensitive than the previously established microarray resource. CONCLUSIONS: We have developed a gene annotation pipeline tailored for the compact genomes and tiny introns of Paramecium species. A novel component of this pipeline, TrUC, predicts transcription units using Cap-Seq and oriented RNA-Seq data. TrUC could prove useful beyond Paramecium, especially in the case of high gene density. Accurate predictions of 3′ and 5′ UTR will be particularly valuable for studies of gene expression (e.g. nucleosome positioning, identification of cis regulatory motifs). The P. tetraurelia improved transcriptome resource, gene annotations for P. tetraurelia, P. biaurelia, P. sexaurelia and P. caudatum, and Paramecium-trained EuGene configuration are available through ParameciumDB (http://paramecium.i2bc.paris-saclay.fr). TrUC software is freely distributed under a GNU GPL v3 licence (https://github.com/oarnaiz/TrUC). ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-017-3887-z) contains supplementary material, which is available to authorized users. BioMed Central 2017-06-26 /pmc/articles/PMC5485702/ /pubmed/28651633 http://dx.doi.org/10.1186/s12864-017-3887-z Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Article Arnaiz, Olivier Van Dijk, Erwin Bétermier, Mireille Lhuillier-Akakpo, Maoussi de Vanssay, Augustin Duharcourt, Sandra Sallet, Erika Gouzy, Jérôme Sperling, Linda Improved methods and resources for paramecium genomics: transcription units, gene annotation and gene expression |
title | Improved methods and resources for paramecium genomics: transcription units, gene annotation and gene expression |
title_full | Improved methods and resources for paramecium genomics: transcription units, gene annotation and gene expression |
title_fullStr | Improved methods and resources for paramecium genomics: transcription units, gene annotation and gene expression |
title_full_unstemmed | Improved methods and resources for paramecium genomics: transcription units, gene annotation and gene expression |
title_short | Improved methods and resources for paramecium genomics: transcription units, gene annotation and gene expression |
title_sort | improved methods and resources for paramecium genomics: transcription units, gene annotation and gene expression |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5485702/ https://www.ncbi.nlm.nih.gov/pubmed/28651633 http://dx.doi.org/10.1186/s12864-017-3887-z |
work_keys_str_mv | AT arnaizolivier improvedmethodsandresourcesforparameciumgenomicstranscriptionunitsgeneannotationandgeneexpression AT vandijkerwin improvedmethodsandresourcesforparameciumgenomicstranscriptionunitsgeneannotationandgeneexpression AT betermiermireille improvedmethodsandresourcesforparameciumgenomicstranscriptionunitsgeneannotationandgeneexpression AT lhuillierakakpomaoussi improvedmethodsandresourcesforparameciumgenomicstranscriptionunitsgeneannotationandgeneexpression AT devanssayaugustin improvedmethodsandresourcesforparameciumgenomicstranscriptionunitsgeneannotationandgeneexpression AT duharcourtsandra improvedmethodsandresourcesforparameciumgenomicstranscriptionunitsgeneannotationandgeneexpression AT salleterika improvedmethodsandresourcesforparameciumgenomicstranscriptionunitsgeneannotationandgeneexpression AT gouzyjerome improvedmethodsandresourcesforparameciumgenomicstranscriptionunitsgeneannotationandgeneexpression AT sperlinglinda improvedmethodsandresourcesforparameciumgenomicstranscriptionunitsgeneannotationandgeneexpression |