Cargando…
Improved annotation with de novo transcriptome assembly in four social amoeba species
BACKGROUND: Annotation of gene models and transcripts is a fundamental step in genome sequencing projects. Often this is performed with automated prediction pipelines, which can miss complex and atypical genes or transcripts. RNA sequencing (RNA-seq) data can aid the annotation with empirical data....
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5282741/ https://www.ncbi.nlm.nih.gov/pubmed/28143409 http://dx.doi.org/10.1186/s12864-017-3505-0 |
_version_ | 1782503382970269696 |
---|---|
author | Singh, Reema Lawal, Hajara M. Schilde, Christina Glöckner, Gernot Barton, Geoffrey J. Schaap, Pauline Cole, Christian |
author_facet | Singh, Reema Lawal, Hajara M. Schilde, Christina Glöckner, Gernot Barton, Geoffrey J. Schaap, Pauline Cole, Christian |
author_sort | Singh, Reema |
collection | PubMed |
description | BACKGROUND: Annotation of gene models and transcripts is a fundamental step in genome sequencing projects. Often this is performed with automated prediction pipelines, which can miss complex and atypical genes or transcripts. RNA sequencing (RNA-seq) data can aid the annotation with empirical data. Here we present de novo transcriptome assemblies generated from RNA-seq data in four Dictyostelid species: D. discoideum, P. pallidum, D. fasciculatum and D. lacteum. The assemblies were incorporated with existing gene models to determine corrections and improvement on a whole-genome scale. This is the first time this has been performed in these eukaryotic species. RESULTS: An initial de novo transcriptome assembly was generated by Trinity for each species and then refined with Program to Assemble Spliced Alignments (PASA). The completeness and quality were assessed with the Benchmarking Universal Single-Copy Orthologs (BUSCO) and Transrate tools at each stage of the assemblies. The final datasets of 11,315-12,849 transcripts contained 5,610-7,712 updates and corrections to >50% of existing gene models including changes to hundreds or thousands of protein products. Putative novel genes are also identified and alternative splice isoforms were observed for the first time in P. pallidum, D. lacteum and D. fasciculatum. CONCLUSIONS: In taking a whole transcriptome approach to genome annotation with empirical data we have been able to enrich the annotations of four existing genome sequencing projects. In doing so we have identified updates to the majority of the gene annotations across all four species under study and found putative novel genes and transcripts which could be worthy for follow-up. The new transcriptome data we present here will be a valuable resource for genome curators in the Dictyostelia and we propose this effective methodology for use in other genome annotation projects. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-017-3505-0) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5282741 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-52827412017-02-03 Improved annotation with de novo transcriptome assembly in four social amoeba species Singh, Reema Lawal, Hajara M. Schilde, Christina Glöckner, Gernot Barton, Geoffrey J. Schaap, Pauline Cole, Christian BMC Genomics Research Article BACKGROUND: Annotation of gene models and transcripts is a fundamental step in genome sequencing projects. Often this is performed with automated prediction pipelines, which can miss complex and atypical genes or transcripts. RNA sequencing (RNA-seq) data can aid the annotation with empirical data. Here we present de novo transcriptome assemblies generated from RNA-seq data in four Dictyostelid species: D. discoideum, P. pallidum, D. fasciculatum and D. lacteum. The assemblies were incorporated with existing gene models to determine corrections and improvement on a whole-genome scale. This is the first time this has been performed in these eukaryotic species. RESULTS: An initial de novo transcriptome assembly was generated by Trinity for each species and then refined with Program to Assemble Spliced Alignments (PASA). The completeness and quality were assessed with the Benchmarking Universal Single-Copy Orthologs (BUSCO) and Transrate tools at each stage of the assemblies. The final datasets of 11,315-12,849 transcripts contained 5,610-7,712 updates and corrections to >50% of existing gene models including changes to hundreds or thousands of protein products. Putative novel genes are also identified and alternative splice isoforms were observed for the first time in P. pallidum, D. lacteum and D. fasciculatum. CONCLUSIONS: In taking a whole transcriptome approach to genome annotation with empirical data we have been able to enrich the annotations of four existing genome sequencing projects. In doing so we have identified updates to the majority of the gene annotations across all four species under study and found putative novel genes and transcripts which could be worthy for follow-up. The new transcriptome data we present here will be a valuable resource for genome curators in the Dictyostelia and we propose this effective methodology for use in other genome annotation projects. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-017-3505-0) contains supplementary material, which is available to authorized users. BioMed Central 2017-01-31 /pmc/articles/PMC5282741/ /pubmed/28143409 http://dx.doi.org/10.1186/s12864-017-3505-0 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Singh, Reema Lawal, Hajara M. Schilde, Christina Glöckner, Gernot Barton, Geoffrey J. Schaap, Pauline Cole, Christian Improved annotation with de novo transcriptome assembly in four social amoeba species |
title | Improved annotation with de novo transcriptome assembly in four social amoeba species |
title_full | Improved annotation with de novo transcriptome assembly in four social amoeba species |
title_fullStr | Improved annotation with de novo transcriptome assembly in four social amoeba species |
title_full_unstemmed | Improved annotation with de novo transcriptome assembly in four social amoeba species |
title_short | Improved annotation with de novo transcriptome assembly in four social amoeba species |
title_sort | improved annotation with de novo transcriptome assembly in four social amoeba species |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5282741/ https://www.ncbi.nlm.nih.gov/pubmed/28143409 http://dx.doi.org/10.1186/s12864-017-3505-0 |
work_keys_str_mv | AT singhreema improvedannotationwithdenovotranscriptomeassemblyinfoursocialamoebaspecies AT lawalhajaram improvedannotationwithdenovotranscriptomeassemblyinfoursocialamoebaspecies AT schildechristina improvedannotationwithdenovotranscriptomeassemblyinfoursocialamoebaspecies AT glocknergernot improvedannotationwithdenovotranscriptomeassemblyinfoursocialamoebaspecies AT bartongeoffreyj improvedannotationwithdenovotranscriptomeassemblyinfoursocialamoebaspecies AT schaappauline improvedannotationwithdenovotranscriptomeassemblyinfoursocialamoebaspecies AT colechristian improvedannotationwithdenovotranscriptomeassemblyinfoursocialamoebaspecies |