Cargando…

A quantitative reference transcriptome for Nematostella vectensis early embryonic development: a pipeline for de novo assembly in emerging model systems

BACKGROUND: The de novo assembly of transcriptomes from short shotgun sequences raises challenges due to random and non-random sequencing biases and inherent transcript complexity. We sought to define a pipeline for de novo transcriptome assembly to aid researchers working with emerging model system...

Descripción completa

Detalles Bibliográficos
Autores principales:	Tulin, Sarah, Aguiar, Derek, Istrail, Sorin, Smith, Joel
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2013
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3748831/ https://www.ncbi.nlm.nih.gov/pubmed/23731568 http://dx.doi.org/10.1186/2041-9139-4-16

_version_	1782281132327305216
author	Tulin, Sarah Aguiar, Derek Istrail, Sorin Smith, Joel
author_facet	Tulin, Sarah Aguiar, Derek Istrail, Sorin Smith, Joel
author_sort	Tulin, Sarah
collection	PubMed
description	BACKGROUND: The de novo assembly of transcriptomes from short shotgun sequences raises challenges due to random and non-random sequencing biases and inherent transcript complexity. We sought to define a pipeline for de novo transcriptome assembly to aid researchers working with emerging model systems where well annotated genome assemblies are not available as a reference. To detail this experimental and computational method, we used early embryos of the sea anemone, Nematostella vectensis, an emerging model system for studies of animal body plan evolution. We performed RNA-seq on embryos up to 24 h of development using Illumina HiSeq technology and evaluated independent de novo assembly methods. The resulting reads were assembled using either the Trinity assembler on all quality controlled reads or both the Velvet and Oases assemblers on reads passing a stringent digital normalization filter. A control set of mRNA standards from the National Institute of Standards and Technology (NIST) was included in our experimental pipeline to invest our transcriptome with quantitative information on absolute transcript levels and to provide additional quality control. RESULTS: We generated >200 million paired-end reads from directional cDNA libraries representing well over 20 Gb of sequence. The Trinity assembler pipeline, including preliminary quality control steps, resulted in more than 86% of reads aligning with the reference transcriptome thus generated. Nevertheless, digital normalization combined with assembly by Velvet and Oases required far less computing power and decreased processing time while still mapping 82% of reads. We have made the raw sequencing reads and assembled transcriptome publically available. CONCLUSIONS: Nematostella vectensis was chosen for its strategic position in the tree of life for studies into the origins of the animal body plan, however, the challenge of reference-free transcriptome assembly is relevant to all systems for which well annotated gene models and independently verified genome assembly may not be available. To navigate this new territory, we have constructed a pipeline for library preparation and computational analysis for de novo transcriptome assembly. The gene models defined by this reference transcriptome define the set of genes transcribed in early Nematostella development and will provide a valuable dataset for further gene regulatory network investigations.
format	Online Article Text
id	pubmed-3748831
institution	National Center for Biotechnology Information
language	English
publishDate	2013
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-37488312013-08-22 A quantitative reference transcriptome for Nematostella vectensis early embryonic development: a pipeline for de novo assembly in emerging model systems Tulin, Sarah Aguiar, Derek Istrail, Sorin Smith, Joel EvoDevo Research BACKGROUND: The de novo assembly of transcriptomes from short shotgun sequences raises challenges due to random and non-random sequencing biases and inherent transcript complexity. We sought to define a pipeline for de novo transcriptome assembly to aid researchers working with emerging model systems where well annotated genome assemblies are not available as a reference. To detail this experimental and computational method, we used early embryos of the sea anemone, Nematostella vectensis, an emerging model system for studies of animal body plan evolution. We performed RNA-seq on embryos up to 24 h of development using Illumina HiSeq technology and evaluated independent de novo assembly methods. The resulting reads were assembled using either the Trinity assembler on all quality controlled reads or both the Velvet and Oases assemblers on reads passing a stringent digital normalization filter. A control set of mRNA standards from the National Institute of Standards and Technology (NIST) was included in our experimental pipeline to invest our transcriptome with quantitative information on absolute transcript levels and to provide additional quality control. RESULTS: We generated >200 million paired-end reads from directional cDNA libraries representing well over 20 Gb of sequence. The Trinity assembler pipeline, including preliminary quality control steps, resulted in more than 86% of reads aligning with the reference transcriptome thus generated. Nevertheless, digital normalization combined with assembly by Velvet and Oases required far less computing power and decreased processing time while still mapping 82% of reads. We have made the raw sequencing reads and assembled transcriptome publically available. CONCLUSIONS: Nematostella vectensis was chosen for its strategic position in the tree of life for studies into the origins of the animal body plan, however, the challenge of reference-free transcriptome assembly is relevant to all systems for which well annotated gene models and independently verified genome assembly may not be available. To navigate this new territory, we have constructed a pipeline for library preparation and computational analysis for de novo transcriptome assembly. The gene models defined by this reference transcriptome define the set of genes transcribed in early Nematostella development and will provide a valuable dataset for further gene regulatory network investigations. BioMed Central 2013-06-03 /pmc/articles/PMC3748831/ /pubmed/23731568 http://dx.doi.org/10.1186/2041-9139-4-16 Text en Copyright © 2013 Tulin et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Tulin, Sarah Aguiar, Derek Istrail, Sorin Smith, Joel A quantitative reference transcriptome for Nematostella vectensis early embryonic development: a pipeline for de novo assembly in emerging model systems
title	A quantitative reference transcriptome for Nematostella vectensis early embryonic development: a pipeline for de novo assembly in emerging model systems
title_full	A quantitative reference transcriptome for Nematostella vectensis early embryonic development: a pipeline for de novo assembly in emerging model systems
title_fullStr	A quantitative reference transcriptome for Nematostella vectensis early embryonic development: a pipeline for de novo assembly in emerging model systems
title_full_unstemmed	A quantitative reference transcriptome for Nematostella vectensis early embryonic development: a pipeline for de novo assembly in emerging model systems
title_short	A quantitative reference transcriptome for Nematostella vectensis early embryonic development: a pipeline for de novo assembly in emerging model systems
title_sort	quantitative reference transcriptome for nematostella vectensis early embryonic development: a pipeline for de novo assembly in emerging model systems
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3748831/ https://www.ncbi.nlm.nih.gov/pubmed/23731568 http://dx.doi.org/10.1186/2041-9139-4-16
work_keys_str_mv	AT tulinsarah aquantitativereferencetranscriptomefornematostellavectensisearlyembryonicdevelopmentapipelinefordenovoassemblyinemergingmodelsystems AT aguiarderek aquantitativereferencetranscriptomefornematostellavectensisearlyembryonicdevelopmentapipelinefordenovoassemblyinemergingmodelsystems AT istrailsorin aquantitativereferencetranscriptomefornematostellavectensisearlyembryonicdevelopmentapipelinefordenovoassemblyinemergingmodelsystems AT smithjoel aquantitativereferencetranscriptomefornematostellavectensisearlyembryonicdevelopmentapipelinefordenovoassemblyinemergingmodelsystems AT tulinsarah quantitativereferencetranscriptomefornematostellavectensisearlyembryonicdevelopmentapipelinefordenovoassemblyinemergingmodelsystems AT aguiarderek quantitativereferencetranscriptomefornematostellavectensisearlyembryonicdevelopmentapipelinefordenovoassemblyinemergingmodelsystems AT istrailsorin quantitativereferencetranscriptomefornematostellavectensisearlyembryonicdevelopmentapipelinefordenovoassemblyinemergingmodelsystems AT smithjoel quantitativereferencetranscriptomefornematostellavectensisearlyembryonicdevelopmentapipelinefordenovoassemblyinemergingmodelsystems

A quantitative reference transcriptome for Nematostella vectensis early embryonic development: a pipeline for de novo assembly in emerging model systems

Ejemplares similares