Cargando…

Improved methods and resources for paramecium genomics: transcription units, gene annotation and gene expression

BACKGROUND: The 15 sibling species of the Paramecium aurelia cryptic species complex emerged after a whole genome duplication that occurred tens of millions of years ago. Given extensive knowledge of the genetics and epigenetics of Paramecium acquired over the last century, this species complex offe...

Descripción completa

Detalles Bibliográficos
Autores principales: Arnaiz, Olivier, Van Dijk, Erwin, Bétermier, Mireille, Lhuillier-Akakpo, Maoussi, de Vanssay, Augustin, Duharcourt, Sandra, Sallet, Erika, Gouzy, Jérôme, Sperling, Linda
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5485702/
https://www.ncbi.nlm.nih.gov/pubmed/28651633
http://dx.doi.org/10.1186/s12864-017-3887-z
_version_ 1783246121849585664
author Arnaiz, Olivier
Van Dijk, Erwin
Bétermier, Mireille
Lhuillier-Akakpo, Maoussi
de Vanssay, Augustin
Duharcourt, Sandra
Sallet, Erika
Gouzy, Jérôme
Sperling, Linda
author_facet Arnaiz, Olivier
Van Dijk, Erwin
Bétermier, Mireille
Lhuillier-Akakpo, Maoussi
de Vanssay, Augustin
Duharcourt, Sandra
Sallet, Erika
Gouzy, Jérôme
Sperling, Linda
author_sort Arnaiz, Olivier
collection PubMed
description BACKGROUND: The 15 sibling species of the Paramecium aurelia cryptic species complex emerged after a whole genome duplication that occurred tens of millions of years ago. Given extensive knowledge of the genetics and epigenetics of Paramecium acquired over the last century, this species complex offers a uniquely powerful system to investigate the consequences of whole genome duplication in a unicellular eukaryote as well as the genetic and epigenetic mechanisms that drive speciation. High quality Paramecium gene models are important for research using this system. The major aim of the work reported here was to build an improved gene annotation pipeline for the Paramecium lineage. RESULTS: We generated oriented RNA-Seq transcriptome data across the sexual process of autogamy for the model species Paramecium tetraurelia. We determined, for the first time in a ciliate, candidate P. tetraurelia transcription start sites using an adapted Cap-Seq protocol. We developed TrUC, multi-threaded Perl software that in conjunction with TopHat mapping of RNA-Seq data to a reference genome, predicts transcription units for the annotation pipeline. We used EuGene software to combine annotation evidence. The high quality gene structural annotations obtained for P. tetraurelia were used as evidence to improve published annotations for 3 other Paramecium species. The RNA-Seq data were also used for differential gene expression analysis, providing a gene expression atlas that is more sensitive than the previously established microarray resource. CONCLUSIONS: We have developed a gene annotation pipeline tailored for the compact genomes and tiny introns of Paramecium species. A novel component of this pipeline, TrUC, predicts transcription units using Cap-Seq and oriented RNA-Seq data. TrUC could prove useful beyond Paramecium, especially in the case of high gene density. Accurate predictions of 3′ and 5′ UTR will be particularly valuable for studies of gene expression (e.g. nucleosome positioning, identification of cis regulatory motifs). The P. tetraurelia improved transcriptome resource, gene annotations for P. tetraurelia, P. biaurelia, P. sexaurelia and P. caudatum, and Paramecium-trained EuGene configuration are available through ParameciumDB (http://paramecium.i2bc.paris-saclay.fr). TrUC software is freely distributed under a GNU GPL v3 licence (https://github.com/oarnaiz/TrUC). ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-017-3887-z) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5485702
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-54857022017-06-30 Improved methods and resources for paramecium genomics: transcription units, gene annotation and gene expression Arnaiz, Olivier Van Dijk, Erwin Bétermier, Mireille Lhuillier-Akakpo, Maoussi de Vanssay, Augustin Duharcourt, Sandra Sallet, Erika Gouzy, Jérôme Sperling, Linda BMC Genomics Methodology Article BACKGROUND: The 15 sibling species of the Paramecium aurelia cryptic species complex emerged after a whole genome duplication that occurred tens of millions of years ago. Given extensive knowledge of the genetics and epigenetics of Paramecium acquired over the last century, this species complex offers a uniquely powerful system to investigate the consequences of whole genome duplication in a unicellular eukaryote as well as the genetic and epigenetic mechanisms that drive speciation. High quality Paramecium gene models are important for research using this system. The major aim of the work reported here was to build an improved gene annotation pipeline for the Paramecium lineage. RESULTS: We generated oriented RNA-Seq transcriptome data across the sexual process of autogamy for the model species Paramecium tetraurelia. We determined, for the first time in a ciliate, candidate P. tetraurelia transcription start sites using an adapted Cap-Seq protocol. We developed TrUC, multi-threaded Perl software that in conjunction with TopHat mapping of RNA-Seq data to a reference genome, predicts transcription units for the annotation pipeline. We used EuGene software to combine annotation evidence. The high quality gene structural annotations obtained for P. tetraurelia were used as evidence to improve published annotations for 3 other Paramecium species. The RNA-Seq data were also used for differential gene expression analysis, providing a gene expression atlas that is more sensitive than the previously established microarray resource. CONCLUSIONS: We have developed a gene annotation pipeline tailored for the compact genomes and tiny introns of Paramecium species. A novel component of this pipeline, TrUC, predicts transcription units using Cap-Seq and oriented RNA-Seq data. TrUC could prove useful beyond Paramecium, especially in the case of high gene density. Accurate predictions of 3′ and 5′ UTR will be particularly valuable for studies of gene expression (e.g. nucleosome positioning, identification of cis regulatory motifs). The P. tetraurelia improved transcriptome resource, gene annotations for P. tetraurelia, P. biaurelia, P. sexaurelia and P. caudatum, and Paramecium-trained EuGene configuration are available through ParameciumDB (http://paramecium.i2bc.paris-saclay.fr). TrUC software is freely distributed under a GNU GPL v3 licence (https://github.com/oarnaiz/TrUC). ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-017-3887-z) contains supplementary material, which is available to authorized users. BioMed Central 2017-06-26 /pmc/articles/PMC5485702/ /pubmed/28651633 http://dx.doi.org/10.1186/s12864-017-3887-z Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Arnaiz, Olivier
Van Dijk, Erwin
Bétermier, Mireille
Lhuillier-Akakpo, Maoussi
de Vanssay, Augustin
Duharcourt, Sandra
Sallet, Erika
Gouzy, Jérôme
Sperling, Linda
Improved methods and resources for paramecium genomics: transcription units, gene annotation and gene expression
title Improved methods and resources for paramecium genomics: transcription units, gene annotation and gene expression
title_full Improved methods and resources for paramecium genomics: transcription units, gene annotation and gene expression
title_fullStr Improved methods and resources for paramecium genomics: transcription units, gene annotation and gene expression
title_full_unstemmed Improved methods and resources for paramecium genomics: transcription units, gene annotation and gene expression
title_short Improved methods and resources for paramecium genomics: transcription units, gene annotation and gene expression
title_sort improved methods and resources for paramecium genomics: transcription units, gene annotation and gene expression
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5485702/
https://www.ncbi.nlm.nih.gov/pubmed/28651633
http://dx.doi.org/10.1186/s12864-017-3887-z
work_keys_str_mv AT arnaizolivier improvedmethodsandresourcesforparameciumgenomicstranscriptionunitsgeneannotationandgeneexpression
AT vandijkerwin improvedmethodsandresourcesforparameciumgenomicstranscriptionunitsgeneannotationandgeneexpression
AT betermiermireille improvedmethodsandresourcesforparameciumgenomicstranscriptionunitsgeneannotationandgeneexpression
AT lhuillierakakpomaoussi improvedmethodsandresourcesforparameciumgenomicstranscriptionunitsgeneannotationandgeneexpression
AT devanssayaugustin improvedmethodsandresourcesforparameciumgenomicstranscriptionunitsgeneannotationandgeneexpression
AT duharcourtsandra improvedmethodsandresourcesforparameciumgenomicstranscriptionunitsgeneannotationandgeneexpression
AT salleterika improvedmethodsandresourcesforparameciumgenomicstranscriptionunitsgeneannotationandgeneexpression
AT gouzyjerome improvedmethodsandresourcesforparameciumgenomicstranscriptionunitsgeneannotationandgeneexpression
AT sperlinglinda improvedmethodsandresourcesforparameciumgenomicstranscriptionunitsgeneannotationandgeneexpression