Cargando…
Re-assembly, quality evaluation, and annotation of 678 microbial eukaryotic reference transcriptomes
BACKGROUND: De novo transcriptome assemblies are required prior to analyzing RNA sequencing data from a species without an existing reference genome or transcriptome. Despite the prevalence of transcriptomic studies, the effects of using different workflows, or “pipelines," on the resulting ass...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6481552/ https://www.ncbi.nlm.nih.gov/pubmed/30544207 http://dx.doi.org/10.1093/gigascience/giy158 |
_version_ | 1783413768190951424 |
---|---|
author | Johnson, Lisa K Alexander, Harriet Brown, C Titus |
author_facet | Johnson, Lisa K Alexander, Harriet Brown, C Titus |
author_sort | Johnson, Lisa K |
collection | PubMed |
description | BACKGROUND: De novo transcriptome assemblies are required prior to analyzing RNA sequencing data from a species without an existing reference genome or transcriptome. Despite the prevalence of transcriptomic studies, the effects of using different workflows, or “pipelines," on the resulting assemblies are poorly understood. Here, a pipeline was programmatically automated and used to assemble and annotate raw transcriptomic short-read data collected as part of the Marine Microbial Eukaryotic Transcriptome Sequencing Project. The resulting transcriptome assemblies were evaluated and compared against assemblies that were previously generated with a different pipeline developed by the National Center for Genome Research. RESULTS: New transcriptome assemblies contained the majority of previous contigs as well as new content. On average, 7.8% of the annotated contigs in the new assemblies were novel gene names not found in the previous assemblies. Taxonomic trends were observed in the assembly metrics. Assemblies from the Dinoflagellata showed a higher number of contigs and unique k-mers than transcriptomes from other phyla, while assemblies from Ciliophora had a lower percentage of open reading frames compared to other phyla. CONCLUSIONS: Given current bioinformatics approaches, there is no single “best” reference transcriptome for a particular set of raw data. As the optimum transcriptome is a moving target, improving (or not) with new tools and approaches, automated and programmable pipelines are invaluable for managing the computationally intensive tasks required for re-processing large sets of samples with revised pipelines and ensuring a common evaluation workflow is applied to all samples. Thus, re-assembling existing data with new tools using automated and programmable pipelines may yield more accurate identification of taxon-specific trends across samples in addition to novel and useful products for the community. |
format | Online Article Text |
id | pubmed-6481552 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-64815522019-04-29 Re-assembly, quality evaluation, and annotation of 678 microbial eukaryotic reference transcriptomes Johnson, Lisa K Alexander, Harriet Brown, C Titus Gigascience Research BACKGROUND: De novo transcriptome assemblies are required prior to analyzing RNA sequencing data from a species without an existing reference genome or transcriptome. Despite the prevalence of transcriptomic studies, the effects of using different workflows, or “pipelines," on the resulting assemblies are poorly understood. Here, a pipeline was programmatically automated and used to assemble and annotate raw transcriptomic short-read data collected as part of the Marine Microbial Eukaryotic Transcriptome Sequencing Project. The resulting transcriptome assemblies were evaluated and compared against assemblies that were previously generated with a different pipeline developed by the National Center for Genome Research. RESULTS: New transcriptome assemblies contained the majority of previous contigs as well as new content. On average, 7.8% of the annotated contigs in the new assemblies were novel gene names not found in the previous assemblies. Taxonomic trends were observed in the assembly metrics. Assemblies from the Dinoflagellata showed a higher number of contigs and unique k-mers than transcriptomes from other phyla, while assemblies from Ciliophora had a lower percentage of open reading frames compared to other phyla. CONCLUSIONS: Given current bioinformatics approaches, there is no single “best” reference transcriptome for a particular set of raw data. As the optimum transcriptome is a moving target, improving (or not) with new tools and approaches, automated and programmable pipelines are invaluable for managing the computationally intensive tasks required for re-processing large sets of samples with revised pipelines and ensuring a common evaluation workflow is applied to all samples. Thus, re-assembling existing data with new tools using automated and programmable pipelines may yield more accurate identification of taxon-specific trends across samples in addition to novel and useful products for the community. Oxford University Press 2018-12-13 /pmc/articles/PMC6481552/ /pubmed/30544207 http://dx.doi.org/10.1093/gigascience/giy158 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Johnson, Lisa K Alexander, Harriet Brown, C Titus Re-assembly, quality evaluation, and annotation of 678 microbial eukaryotic reference transcriptomes |
title | Re-assembly, quality evaluation, and annotation of 678 microbial eukaryotic reference transcriptomes |
title_full | Re-assembly, quality evaluation, and annotation of 678 microbial eukaryotic reference transcriptomes |
title_fullStr | Re-assembly, quality evaluation, and annotation of 678 microbial eukaryotic reference transcriptomes |
title_full_unstemmed | Re-assembly, quality evaluation, and annotation of 678 microbial eukaryotic reference transcriptomes |
title_short | Re-assembly, quality evaluation, and annotation of 678 microbial eukaryotic reference transcriptomes |
title_sort | re-assembly, quality evaluation, and annotation of 678 microbial eukaryotic reference transcriptomes |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6481552/ https://www.ncbi.nlm.nih.gov/pubmed/30544207 http://dx.doi.org/10.1093/gigascience/giy158 |
work_keys_str_mv | AT johnsonlisak reassemblyqualityevaluationandannotationof678microbialeukaryoticreferencetranscriptomes AT alexanderharriet reassemblyqualityevaluationandannotationof678microbialeukaryoticreferencetranscriptomes AT brownctitus reassemblyqualityevaluationandannotationof678microbialeukaryoticreferencetranscriptomes |