Cargando…

Re-assembly, quality evaluation, and annotation of 678 microbial eukaryotic reference transcriptomes

BACKGROUND: De novo transcriptome assemblies are required prior to analyzing RNA sequencing data from a species without an existing reference genome or transcriptome. Despite the prevalence of transcriptomic studies, the effects of using different workflows, or “pipelines," on the resulting ass...

Descripción completa

Detalles Bibliográficos
Autores principales:	Johnson, Lisa K, Alexander, Harriet, Brown, C Titus
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2018
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6481552/ https://www.ncbi.nlm.nih.gov/pubmed/30544207 http://dx.doi.org/10.1093/gigascience/giy158

_version_	1783413768190951424
author	Johnson, Lisa K Alexander, Harriet Brown, C Titus
author_facet	Johnson, Lisa K Alexander, Harriet Brown, C Titus
author_sort	Johnson, Lisa K
collection	PubMed
description	BACKGROUND: De novo transcriptome assemblies are required prior to analyzing RNA sequencing data from a species without an existing reference genome or transcriptome. Despite the prevalence of transcriptomic studies, the effects of using different workflows, or “pipelines," on the resulting assemblies are poorly understood. Here, a pipeline was programmatically automated and used to assemble and annotate raw transcriptomic short-read data collected as part of the Marine Microbial Eukaryotic Transcriptome Sequencing Project. The resulting transcriptome assemblies were evaluated and compared against assemblies that were previously generated with a different pipeline developed by the National Center for Genome Research. RESULTS: New transcriptome assemblies contained the majority of previous contigs as well as new content. On average, 7.8% of the annotated contigs in the new assemblies were novel gene names not found in the previous assemblies. Taxonomic trends were observed in the assembly metrics. Assemblies from the Dinoflagellata showed a higher number of contigs and unique k-mers than transcriptomes from other phyla, while assemblies from Ciliophora had a lower percentage of open reading frames compared to other phyla. CONCLUSIONS: Given current bioinformatics approaches, there is no single “best” reference transcriptome for a particular set of raw data. As the optimum transcriptome is a moving target, improving (or not) with new tools and approaches, automated and programmable pipelines are invaluable for managing the computationally intensive tasks required for re-processing large sets of samples with revised pipelines and ensuring a common evaluation workflow is applied to all samples. Thus, re-assembling existing data with new tools using automated and programmable pipelines may yield more accurate identification of taxon-specific trends across samples in addition to novel and useful products for the community.
format	Online Article Text
id	pubmed-6481552
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-64815522019-04-29 Re-assembly, quality evaluation, and annotation of 678 microbial eukaryotic reference transcriptomes Johnson, Lisa K Alexander, Harriet Brown, C Titus Gigascience Research BACKGROUND: De novo transcriptome assemblies are required prior to analyzing RNA sequencing data from a species without an existing reference genome or transcriptome. Despite the prevalence of transcriptomic studies, the effects of using different workflows, or “pipelines," on the resulting assemblies are poorly understood. Here, a pipeline was programmatically automated and used to assemble and annotate raw transcriptomic short-read data collected as part of the Marine Microbial Eukaryotic Transcriptome Sequencing Project. The resulting transcriptome assemblies were evaluated and compared against assemblies that were previously generated with a different pipeline developed by the National Center for Genome Research. RESULTS: New transcriptome assemblies contained the majority of previous contigs as well as new content. On average, 7.8% of the annotated contigs in the new assemblies were novel gene names not found in the previous assemblies. Taxonomic trends were observed in the assembly metrics. Assemblies from the Dinoflagellata showed a higher number of contigs and unique k-mers than transcriptomes from other phyla, while assemblies from Ciliophora had a lower percentage of open reading frames compared to other phyla. CONCLUSIONS: Given current bioinformatics approaches, there is no single “best” reference transcriptome for a particular set of raw data. As the optimum transcriptome is a moving target, improving (or not) with new tools and approaches, automated and programmable pipelines are invaluable for managing the computationally intensive tasks required for re-processing large sets of samples with revised pipelines and ensuring a common evaluation workflow is applied to all samples. Thus, re-assembling existing data with new tools using automated and programmable pipelines may yield more accurate identification of taxon-specific trends across samples in addition to novel and useful products for the community. Oxford University Press 2018-12-13 /pmc/articles/PMC6481552/ /pubmed/30544207 http://dx.doi.org/10.1093/gigascience/giy158 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Johnson, Lisa K Alexander, Harriet Brown, C Titus Re-assembly, quality evaluation, and annotation of 678 microbial eukaryotic reference transcriptomes
title	Re-assembly, quality evaluation, and annotation of 678 microbial eukaryotic reference transcriptomes
title_full	Re-assembly, quality evaluation, and annotation of 678 microbial eukaryotic reference transcriptomes
title_fullStr	Re-assembly, quality evaluation, and annotation of 678 microbial eukaryotic reference transcriptomes
title_full_unstemmed	Re-assembly, quality evaluation, and annotation of 678 microbial eukaryotic reference transcriptomes
title_short	Re-assembly, quality evaluation, and annotation of 678 microbial eukaryotic reference transcriptomes
title_sort	re-assembly, quality evaluation, and annotation of 678 microbial eukaryotic reference transcriptomes
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6481552/ https://www.ncbi.nlm.nih.gov/pubmed/30544207 http://dx.doi.org/10.1093/gigascience/giy158
work_keys_str_mv	AT johnsonlisak reassemblyqualityevaluationandannotationof678microbialeukaryoticreferencetranscriptomes AT alexanderharriet reassemblyqualityevaluationandannotationof678microbialeukaryoticreferencetranscriptomes AT brownctitus reassemblyqualityevaluationandannotationof678microbialeukaryoticreferencetranscriptomes

Re-assembly, quality evaluation, and annotation of 678 microbial eukaryotic reference transcriptomes

Ejemplares similares