Cargando…

A survey of the complex transcriptome from the highly polyploid sugarcane genome using full-length isoform sequencing and de novo assembly from short read sequencing

BACKGROUND: Despite the economic importance of sugarcane in sugar and bioenergy production, there is not yet a reference genome available. Most of the sugarcane transcriptomic studies have been based on Saccharum officinarum gene indices (SoGI), expressed sequence tags (ESTs) and de novo assembled t...

Descripción completa

Detalles Bibliográficos
Autores principales:	Hoang, Nam V., Furtado, Agnelo, Mason, Patrick J., Marquardt, Annelie, Kasirajan, Lakshmi, Thirugnanasambandam, Prathima P., Botha, Frederik C., Henry, Robert J.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2017
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5440902/ https://www.ncbi.nlm.nih.gov/pubmed/28532419 http://dx.doi.org/10.1186/s12864-017-3757-8

_version_	1783238153506652160
author	Hoang, Nam V. Furtado, Agnelo Mason, Patrick J. Marquardt, Annelie Kasirajan, Lakshmi Thirugnanasambandam, Prathima P. Botha, Frederik C. Henry, Robert J.
author_facet	Hoang, Nam V. Furtado, Agnelo Mason, Patrick J. Marquardt, Annelie Kasirajan, Lakshmi Thirugnanasambandam, Prathima P. Botha, Frederik C. Henry, Robert J.
author_sort	Hoang, Nam V.
collection	PubMed
description	BACKGROUND: Despite the economic importance of sugarcane in sugar and bioenergy production, there is not yet a reference genome available. Most of the sugarcane transcriptomic studies have been based on Saccharum officinarum gene indices (SoGI), expressed sequence tags (ESTs) and de novo assembled transcript contigs from short-reads; hence knowledge of the sugarcane transcriptome is limited in relation to transcript length and number of transcript isoforms. RESULTS: The sugarcane transcriptome was sequenced using PacBio isoform sequencing (Iso-Seq) of a pooled RNA sample derived from leaf, internode and root tissues, of different developmental stages, from 22 varieties, to explore the potential for capturing full-length transcript isoforms. A total of 107,598 unique transcript isoforms were obtained, representing about 71% of the total number of predicted sugarcane genes. The majority of this dataset (92%) matched the plant protein database, while just over 2% was novel transcripts, and over 2% was putative long non-coding RNAs. About 56% and 23% of total sequences were annotated against the gene ontology and KEGG pathway databases, respectively. Comparison with de novo contigs from Illumina RNA-Sequencing (RNA-Seq) of the internode samples from the same experiment and public databases showed that the Iso-Seq method recovered more full-length transcript isoforms, had a higher N50 and average length of largest 1,000 proteins; whereas a greater representation of the gene content and RNA diversity was captured in RNA-Seq. Only 62% of PacBio transcript isoforms matched 67% of de novo contigs, while the non-matched proportions were attributed to the inclusion of leaf/root tissues and the normalization in PacBio, and the representation of more gene content and RNA classes in the de novo assembly, respectively. About 69% of PacBio transcript isoforms and 41% of de novo contigs aligned with the sorghum genome, indicating the high conservation of orthologs in the genic regions of the two genomes. CONCLUSIONS: The transcriptome dataset should contribute to improved sugarcane gene models and sugarcane protein predictions; and will serve as a reference database for analysis of transcript expression in sugarcane. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-017-3757-8) contains supplementary material, which is available to authorized users.
format	Online Article Text
id	pubmed-5440902
institution	National Center for Biotechnology Information
language	English
publishDate	2017
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-54409022017-05-24 A survey of the complex transcriptome from the highly polyploid sugarcane genome using full-length isoform sequencing and de novo assembly from short read sequencing Hoang, Nam V. Furtado, Agnelo Mason, Patrick J. Marquardt, Annelie Kasirajan, Lakshmi Thirugnanasambandam, Prathima P. Botha, Frederik C. Henry, Robert J. BMC Genomics Research Article BACKGROUND: Despite the economic importance of sugarcane in sugar and bioenergy production, there is not yet a reference genome available. Most of the sugarcane transcriptomic studies have been based on Saccharum officinarum gene indices (SoGI), expressed sequence tags (ESTs) and de novo assembled transcript contigs from short-reads; hence knowledge of the sugarcane transcriptome is limited in relation to transcript length and number of transcript isoforms. RESULTS: The sugarcane transcriptome was sequenced using PacBio isoform sequencing (Iso-Seq) of a pooled RNA sample derived from leaf, internode and root tissues, of different developmental stages, from 22 varieties, to explore the potential for capturing full-length transcript isoforms. A total of 107,598 unique transcript isoforms were obtained, representing about 71% of the total number of predicted sugarcane genes. The majority of this dataset (92%) matched the plant protein database, while just over 2% was novel transcripts, and over 2% was putative long non-coding RNAs. About 56% and 23% of total sequences were annotated against the gene ontology and KEGG pathway databases, respectively. Comparison with de novo contigs from Illumina RNA-Sequencing (RNA-Seq) of the internode samples from the same experiment and public databases showed that the Iso-Seq method recovered more full-length transcript isoforms, had a higher N50 and average length of largest 1,000 proteins; whereas a greater representation of the gene content and RNA diversity was captured in RNA-Seq. Only 62% of PacBio transcript isoforms matched 67% of de novo contigs, while the non-matched proportions were attributed to the inclusion of leaf/root tissues and the normalization in PacBio, and the representation of more gene content and RNA classes in the de novo assembly, respectively. About 69% of PacBio transcript isoforms and 41% of de novo contigs aligned with the sorghum genome, indicating the high conservation of orthologs in the genic regions of the two genomes. CONCLUSIONS: The transcriptome dataset should contribute to improved sugarcane gene models and sugarcane protein predictions; and will serve as a reference database for analysis of transcript expression in sugarcane. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-017-3757-8) contains supplementary material, which is available to authorized users. BioMed Central 2017-05-22 /pmc/articles/PMC5440902/ /pubmed/28532419 http://dx.doi.org/10.1186/s12864-017-3757-8 Text en © The Author(s). 2017 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Article Hoang, Nam V. Furtado, Agnelo Mason, Patrick J. Marquardt, Annelie Kasirajan, Lakshmi Thirugnanasambandam, Prathima P. Botha, Frederik C. Henry, Robert J. A survey of the complex transcriptome from the highly polyploid sugarcane genome using full-length isoform sequencing and de novo assembly from short read sequencing
title	A survey of the complex transcriptome from the highly polyploid sugarcane genome using full-length isoform sequencing and de novo assembly from short read sequencing
title_full	A survey of the complex transcriptome from the highly polyploid sugarcane genome using full-length isoform sequencing and de novo assembly from short read sequencing
title_fullStr	A survey of the complex transcriptome from the highly polyploid sugarcane genome using full-length isoform sequencing and de novo assembly from short read sequencing
title_full_unstemmed	A survey of the complex transcriptome from the highly polyploid sugarcane genome using full-length isoform sequencing and de novo assembly from short read sequencing
title_short	A survey of the complex transcriptome from the highly polyploid sugarcane genome using full-length isoform sequencing and de novo assembly from short read sequencing
title_sort	survey of the complex transcriptome from the highly polyploid sugarcane genome using full-length isoform sequencing and de novo assembly from short read sequencing
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5440902/ https://www.ncbi.nlm.nih.gov/pubmed/28532419 http://dx.doi.org/10.1186/s12864-017-3757-8
work_keys_str_mv	AT hoangnamv asurveyofthecomplextranscriptomefromthehighlypolyploidsugarcanegenomeusingfulllengthisoformsequencinganddenovoassemblyfromshortreadsequencing AT furtadoagnelo asurveyofthecomplextranscriptomefromthehighlypolyploidsugarcanegenomeusingfulllengthisoformsequencinganddenovoassemblyfromshortreadsequencing AT masonpatrickj asurveyofthecomplextranscriptomefromthehighlypolyploidsugarcanegenomeusingfulllengthisoformsequencinganddenovoassemblyfromshortreadsequencing AT marquardtannelie asurveyofthecomplextranscriptomefromthehighlypolyploidsugarcanegenomeusingfulllengthisoformsequencinganddenovoassemblyfromshortreadsequencing AT kasirajanlakshmi asurveyofthecomplextranscriptomefromthehighlypolyploidsugarcanegenomeusingfulllengthisoformsequencinganddenovoassemblyfromshortreadsequencing AT thirugnanasambandamprathimap asurveyofthecomplextranscriptomefromthehighlypolyploidsugarcanegenomeusingfulllengthisoformsequencinganddenovoassemblyfromshortreadsequencing AT bothafrederikc asurveyofthecomplextranscriptomefromthehighlypolyploidsugarcanegenomeusingfulllengthisoformsequencinganddenovoassemblyfromshortreadsequencing AT henryrobertj asurveyofthecomplextranscriptomefromthehighlypolyploidsugarcanegenomeusingfulllengthisoformsequencinganddenovoassemblyfromshortreadsequencing AT hoangnamv surveyofthecomplextranscriptomefromthehighlypolyploidsugarcanegenomeusingfulllengthisoformsequencinganddenovoassemblyfromshortreadsequencing AT furtadoagnelo surveyofthecomplextranscriptomefromthehighlypolyploidsugarcanegenomeusingfulllengthisoformsequencinganddenovoassemblyfromshortreadsequencing AT masonpatrickj surveyofthecomplextranscriptomefromthehighlypolyploidsugarcanegenomeusingfulllengthisoformsequencinganddenovoassemblyfromshortreadsequencing AT marquardtannelie surveyofthecomplextranscriptomefromthehighlypolyploidsugarcanegenomeusingfulllengthisoformsequencinganddenovoassemblyfromshortreadsequencing AT kasirajanlakshmi surveyofthecomplextranscriptomefromthehighlypolyploidsugarcanegenomeusingfulllengthisoformsequencinganddenovoassemblyfromshortreadsequencing AT thirugnanasambandamprathimap surveyofthecomplextranscriptomefromthehighlypolyploidsugarcanegenomeusingfulllengthisoformsequencinganddenovoassemblyfromshortreadsequencing AT bothafrederikc surveyofthecomplextranscriptomefromthehighlypolyploidsugarcanegenomeusingfulllengthisoformsequencinganddenovoassemblyfromshortreadsequencing AT henryrobertj surveyofthecomplextranscriptomefromthehighlypolyploidsugarcanegenomeusingfulllengthisoformsequencinganddenovoassemblyfromshortreadsequencing

A survey of the complex transcriptome from the highly polyploid sugarcane genome using full-length isoform sequencing and de novo assembly from short read sequencing

Ejemplares similares