Cargando…

Optimizing and benchmarking de novo transcriptome sequencing: from library preparation to assembly evaluation

BACKGROUND: RNA-seq enables gene expression profiling in selected spatiotemporal windows and yields massive sequence information with relatively low cost and time investment, even for non-model species. However, there remains a large room for optimizing its workflow, in order to take full advantage...

Descripción completa

Detalles Bibliográficos
Autores principales:	Hara, Yuichiro, Tatsumi, Kaori, Yoshida, Michio, Kajikawa, Eriko, Kiyonari, Hiroshi, Kuraku, Shigehiro
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2015
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4652379/ https://www.ncbi.nlm.nih.gov/pubmed/26581708 http://dx.doi.org/10.1186/s12864-015-2007-1

_version_	1782401741638074368
author	Hara, Yuichiro Tatsumi, Kaori Yoshida, Michio Kajikawa, Eriko Kiyonari, Hiroshi Kuraku, Shigehiro
author_facet	Hara, Yuichiro Tatsumi, Kaori Yoshida, Michio Kajikawa, Eriko Kiyonari, Hiroshi Kuraku, Shigehiro
author_sort	Hara, Yuichiro
collection	PubMed
description	BACKGROUND: RNA-seq enables gene expression profiling in selected spatiotemporal windows and yields massive sequence information with relatively low cost and time investment, even for non-model species. However, there remains a large room for optimizing its workflow, in order to take full advantage of continuously developing sequencing capacity. METHOD: Transcriptome sequencing for three embryonic stages of Madagascar ground gecko (Paroedura picta) was performed with the Illumina platform. The output reads were assembled de novo for reconstructing transcript sequences. In order to evaluate the completeness of transcriptome assemblies, we prepared a reference gene set consisting of vertebrate one-to-one orthologs. RESULT: To take advantage of increased read length of >150 nt, we demonstrated shortened RNA fragmentation time, which resulted in a dramatic shift of insert size distribution. To evaluate products of multiple de novo assembly runs incorporating reads with different RNA sources, read lengths, and insert sizes, we introduce a new reference gene set, core vertebrate genes (CVG), consisting of 233 genes that are shared as one-to-one orthologs by all vertebrate genomes examined (29 species)., The completeness assessment performed by the computational pipelines CEGMA and BUSCO referring to CVG, demonstrated higher accuracy and resolution than with the gene set previously established for this purpose. As a result of the assessment with CVG, we have derived the most comprehensive transcript sequence set of the Madagascar ground gecko by means of assembling individual libraries followed by clustering the assembled sequences based on their overall similarities. CONCLUSION: Our results provide several insights into optimizing de novo RNA-seq workflow, including the coordination between library insert size and read length, which manifested in improved connectivity of assemblies. The approach and assembly assessment with CVG demonstrated here would be applicable to transcriptome analysis of other species as well as whole genome analyses. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-2007-1) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-4652379
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-46523792015-11-20 Optimizing and benchmarking de novo transcriptome sequencing: from library preparation to assembly evaluation Hara, Yuichiro Tatsumi, Kaori Yoshida, Michio Kajikawa, Eriko Kiyonari, Hiroshi Kuraku, Shigehiro BMC Genomics Research Article BACKGROUND: RNA-seq enables gene expression profiling in selected spatiotemporal windows and yields massive sequence information with relatively low cost and time investment, even for non-model species. However, there remains a large room for optimizing its workflow, in order to take full advantage of continuously developing sequencing capacity. METHOD: Transcriptome sequencing for three embryonic stages of Madagascar ground gecko (Paroedura picta) was performed with the Illumina platform. The output reads were assembled de novo for reconstructing transcript sequences. In order to evaluate the completeness of transcriptome assemblies, we prepared a reference gene set consisting of vertebrate one-to-one orthologs. RESULT: To take advantage of increased read length of >150 nt, we demonstrated shortened RNA fragmentation time, which resulted in a dramatic shift of insert size distribution. To evaluate products of multiple de novo assembly runs incorporating reads with different RNA sources, read lengths, and insert sizes, we introduce a new reference gene set, core vertebrate genes (CVG), consisting of 233 genes that are shared as one-to-one orthologs by all vertebrate genomes examined (29 species)., The completeness assessment performed by the computational pipelines CEGMA and BUSCO referring to CVG, demonstrated higher accuracy and resolution than with the gene set previously established for this purpose. As a result of the assessment with CVG, we have derived the most comprehensive transcript sequence set of the Madagascar ground gecko by means of assembling individual libraries followed by clustering the assembled sequences based on their overall similarities. CONCLUSION: Our results provide several insights into optimizing de novo RNA-seq workflow, including the coordination between library insert size and read length, which manifested in improved connectivity of assemblies. The approach and assembly assessment with CVG demonstrated here would be applicable to transcriptome analysis of other species as well as whole genome analyses. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-015-2007-1) contains supplementary material, which is available to authorized users. BioMed Central 2015-11-18 /pmc/articles/PMC4652379/ /pubmed/26581708 http://dx.doi.org/10.1186/s12864-015-2007-1 Text en © Hara et al. 2015 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Article Hara, Yuichiro Tatsumi, Kaori Yoshida, Michio Kajikawa, Eriko Kiyonari, Hiroshi Kuraku, Shigehiro Optimizing and benchmarking de novo transcriptome sequencing: from library preparation to assembly evaluation
title	Optimizing and benchmarking de novo transcriptome sequencing: from library preparation to assembly evaluation
title_full	Optimizing and benchmarking de novo transcriptome sequencing: from library preparation to assembly evaluation
title_fullStr	Optimizing and benchmarking de novo transcriptome sequencing: from library preparation to assembly evaluation
title_full_unstemmed	Optimizing and benchmarking de novo transcriptome sequencing: from library preparation to assembly evaluation
title_short	Optimizing and benchmarking de novo transcriptome sequencing: from library preparation to assembly evaluation
title_sort	optimizing and benchmarking de novo transcriptome sequencing: from library preparation to assembly evaluation
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4652379/ https://www.ncbi.nlm.nih.gov/pubmed/26581708 http://dx.doi.org/10.1186/s12864-015-2007-1
work_keys_str_mv	AT harayuichiro optimizingandbenchmarkingdenovotranscriptomesequencingfromlibrarypreparationtoassemblyevaluation AT tatsumikaori optimizingandbenchmarkingdenovotranscriptomesequencingfromlibrarypreparationtoassemblyevaluation AT yoshidamichio optimizingandbenchmarkingdenovotranscriptomesequencingfromlibrarypreparationtoassemblyevaluation AT kajikawaeriko optimizingandbenchmarkingdenovotranscriptomesequencingfromlibrarypreparationtoassemblyevaluation AT kiyonarihiroshi optimizingandbenchmarkingdenovotranscriptomesequencingfromlibrarypreparationtoassemblyevaluation AT kurakushigehiro optimizingandbenchmarkingdenovotranscriptomesequencingfromlibrarypreparationtoassemblyevaluation

Optimizing and benchmarking de novo transcriptome sequencing: from library preparation to assembly evaluation

Ejemplares similares