Cargando…

RNA sequencing read depth requirement for optimal transcriptome coverage in Hevea brasiliensis

BACKGROUND: One of the concerns of assembling de novo transcriptomes is determining the amount of read sequences required to ensure a comprehensive coverage of genes expressed in a particular sample. In this report, we describe the use of Illumina paired-end RNA-Seq (PE RNA-Seq) reads from Hevea bra...

Descripción completa

Detalles Bibliográficos
Autores principales:	Chow, Keng-See, Ghazali, Ahmad-Kamal, Hoh, Chee-Choong, Mohd-Zainuddin, Zainorlina
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2014
Materias:	Technical Note
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3926681/ https://www.ncbi.nlm.nih.gov/pubmed/24484543 http://dx.doi.org/10.1186/1756-0500-7-69

_version_	1782304011281498112
author	Chow, Keng-See Ghazali, Ahmad-Kamal Hoh, Chee-Choong Mohd-Zainuddin, Zainorlina
author_facet	Chow, Keng-See Ghazali, Ahmad-Kamal Hoh, Chee-Choong Mohd-Zainuddin, Zainorlina
author_sort	Chow, Keng-See
collection	PubMed
description	BACKGROUND: One of the concerns of assembling de novo transcriptomes is determining the amount of read sequences required to ensure a comprehensive coverage of genes expressed in a particular sample. In this report, we describe the use of Illumina paired-end RNA-Seq (PE RNA-Seq) reads from Hevea brasiliensis (rubber tree) bark to devise a transcript mapping approach for the estimation of the read amount needed for deep transcriptome coverage. FINDINGS: We optimized the assembly of a Hevea bark transcriptome based on 16 Gb Illumina PE RNA-Seq reads using the Oases assembler across a range of k-mer sizes. We then assessed assembly quality based on transcript N50 length and transcript mapping statistics in relation to (a) known Hevea cDNAs with complete open reading frames, (b) a set of core eukaryotic genes and (c) Hevea genome scaffolds. This was followed by a systematic transcript mapping process where sub-assemblies from a series of incremental amounts of bark transcripts were aligned to transcripts from the entire bark transcriptome assembly. The exercise served to relate read amounts to the degree of transcript mapping level, the latter being an indicator of the coverage of gene transcripts expressed in the sample. As read amounts or datasize increased toward 16 Gb, the number of transcripts mapped to the entire bark assembly approached saturation. A colour matrix was subsequently generated to illustrate sequencing depth requirement in relation to the degree of coverage of total sample transcripts. CONCLUSIONS: We devised a procedure, the “transcript mapping saturation test”, to estimate the amount of RNA-Seq reads needed for deep coverage of transcriptomes. For Hevea de novo assembly, we propose generating between 5–8 Gb reads, whereby around 90% transcript coverage could be achieved with optimized k-mers and transcript N50 length. The principle behind this methodology may also be applied to other non-model plants, or with reads from other second generation sequencing platforms.
format	Online Article Text
id	pubmed-3926681
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-39266812014-02-18 RNA sequencing read depth requirement for optimal transcriptome coverage in Hevea brasiliensis Chow, Keng-See Ghazali, Ahmad-Kamal Hoh, Chee-Choong Mohd-Zainuddin, Zainorlina BMC Res Notes Technical Note BACKGROUND: One of the concerns of assembling de novo transcriptomes is determining the amount of read sequences required to ensure a comprehensive coverage of genes expressed in a particular sample. In this report, we describe the use of Illumina paired-end RNA-Seq (PE RNA-Seq) reads from Hevea brasiliensis (rubber tree) bark to devise a transcript mapping approach for the estimation of the read amount needed for deep transcriptome coverage. FINDINGS: We optimized the assembly of a Hevea bark transcriptome based on 16 Gb Illumina PE RNA-Seq reads using the Oases assembler across a range of k-mer sizes. We then assessed assembly quality based on transcript N50 length and transcript mapping statistics in relation to (a) known Hevea cDNAs with complete open reading frames, (b) a set of core eukaryotic genes and (c) Hevea genome scaffolds. This was followed by a systematic transcript mapping process where sub-assemblies from a series of incremental amounts of bark transcripts were aligned to transcripts from the entire bark transcriptome assembly. The exercise served to relate read amounts to the degree of transcript mapping level, the latter being an indicator of the coverage of gene transcripts expressed in the sample. As read amounts or datasize increased toward 16 Gb, the number of transcripts mapped to the entire bark assembly approached saturation. A colour matrix was subsequently generated to illustrate sequencing depth requirement in relation to the degree of coverage of total sample transcripts. CONCLUSIONS: We devised a procedure, the “transcript mapping saturation test”, to estimate the amount of RNA-Seq reads needed for deep coverage of transcriptomes. For Hevea de novo assembly, we propose generating between 5–8 Gb reads, whereby around 90% transcript coverage could be achieved with optimized k-mers and transcript N50 length. The principle behind this methodology may also be applied to other non-model plants, or with reads from other second generation sequencing platforms. BioMed Central 2014-02-01 /pmc/articles/PMC3926681/ /pubmed/24484543 http://dx.doi.org/10.1186/1756-0500-7-69 Text en Copyright © 2014 Chow et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Technical Note Chow, Keng-See Ghazali, Ahmad-Kamal Hoh, Chee-Choong Mohd-Zainuddin, Zainorlina RNA sequencing read depth requirement for optimal transcriptome coverage in Hevea brasiliensis
title	RNA sequencing read depth requirement for optimal transcriptome coverage in Hevea brasiliensis
title_full	RNA sequencing read depth requirement for optimal transcriptome coverage in Hevea brasiliensis
title_fullStr	RNA sequencing read depth requirement for optimal transcriptome coverage in Hevea brasiliensis
title_full_unstemmed	RNA sequencing read depth requirement for optimal transcriptome coverage in Hevea brasiliensis
title_short	RNA sequencing read depth requirement for optimal transcriptome coverage in Hevea brasiliensis
title_sort	rna sequencing read depth requirement for optimal transcriptome coverage in hevea brasiliensis
topic	Technical Note
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3926681/ https://www.ncbi.nlm.nih.gov/pubmed/24484543 http://dx.doi.org/10.1186/1756-0500-7-69
work_keys_str_mv	AT chowkengsee rnasequencingreaddepthrequirementforoptimaltranscriptomecoverageinheveabrasiliensis AT ghazaliahmadkamal rnasequencingreaddepthrequirementforoptimaltranscriptomecoverageinheveabrasiliensis AT hohcheechoong rnasequencingreaddepthrequirementforoptimaltranscriptomecoverageinheveabrasiliensis AT mohdzainuddinzainorlina rnasequencingreaddepthrequirementforoptimaltranscriptomecoverageinheveabrasiliensis

RNA sequencing read depth requirement for optimal transcriptome coverage in Hevea brasiliensis

Ejemplares similares