Cargando…

Transcriptome analysis of extant cotton progenitors revealed tetraploidization and identified genome-specific single nucleotide polymorphism in diploid and allotetraploid cotton

BACKGROUND: The most widely cultivated cotton (Gossypium hirsutum L., AD-genome) is derived from tetraploidization between A- and D-genome species. G. arboreum L. (A-genome) and G. raimondii Ulbr. (D-genome) are two of closely-related extant progenitors. Gene expression studies in allotetraploid cot...

Descripción completa

Detalles Bibliográficos
Autores principales: Guan, Xueying, Nah, Gyoungju, Song, Qingxin, Udall, Joshua A, Stelly, David M, Chen, Z Jeffrey
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4267057/
https://www.ncbi.nlm.nih.gov/pubmed/25099166
http://dx.doi.org/10.1186/1756-0500-7-493
_version_ 1782349094224658432
author Guan, Xueying
Nah, Gyoungju
Song, Qingxin
Udall, Joshua A
Stelly, David M
Chen, Z Jeffrey
author_facet Guan, Xueying
Nah, Gyoungju
Song, Qingxin
Udall, Joshua A
Stelly, David M
Chen, Z Jeffrey
author_sort Guan, Xueying
collection PubMed
description BACKGROUND: The most widely cultivated cotton (Gossypium hirsutum L., AD-genome) is derived from tetraploidization between A- and D-genome species. G. arboreum L. (A-genome) and G. raimondii Ulbr. (D-genome) are two of closely-related extant progenitors. Gene expression studies in allotetraploid cotton are complicated by the homoeologous loci of A- and D-genome origins. To develop genomic resources for gene expression and cotton breeding, we sequenced and assembled expressed sequence tags (ESTs) derived from G. arboreum and G. raimondii. RESULTS: Roche/454 FLX sequencing technology was employed to sequence normalized cDNA libraries prepared from leaves, roots, bolls, ovules, and fibers in G. arboreum and G. raimondii, respectively. Sequencing reads from two independent libraries in each species were combined to assemble high-quality EST contigs. The combined sequencing reads included 1,699,776 from A-genome and 1,464,815 from D-genome, which were clustered into 89,588 contigs in the A-genome and 65,542 contigs in the D-genome. These contigs represented ~80% of EST collections in Cotton Gene Index 11 (CGI11, March 2011). Compared to the D-genome transcript database, 27,537 and 10,452 contigs were unique transcripts in A and D genomes, respectively. Further analysis using self-blastn reduced the unigene contig number by 52% in A-genome and 57% in D-genome, suggesting that 50% or more of contigs are paralogs or isoforms within each species. The majority of EST contigs (73–81%) were conserved between A- and D-genomes, whereas 27% and 19% contigs were specific to A- and D-genomes, respectively. Using these ESTs, we generated a total of 75,754 genome-specific single nucleotide polymorphism (SNP) (gSNPs or GNPs) or homoeologous-specific SNPs (hSNPs) of 10,885 contigs or genes between A and D genomes, indicating a possibility of separating allelic expression for those genes in allotetraploid cotton. CONCLUSIONS: Expressed genes are highly redundant within each diploid progenitor and between A and D progenitor species, suggesting that diploid progenitors in cotton are likely ancient tetraploids. This large set of A- and D-genome ESTs and GNPs will be valuable resources for genome annotation, gene expression, and crop improvement in allotetraploid cotton. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1756-0500-7-493) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4267057
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-42670572014-12-17 Transcriptome analysis of extant cotton progenitors revealed tetraploidization and identified genome-specific single nucleotide polymorphism in diploid and allotetraploid cotton Guan, Xueying Nah, Gyoungju Song, Qingxin Udall, Joshua A Stelly, David M Chen, Z Jeffrey BMC Res Notes Research Article BACKGROUND: The most widely cultivated cotton (Gossypium hirsutum L., AD-genome) is derived from tetraploidization between A- and D-genome species. G. arboreum L. (A-genome) and G. raimondii Ulbr. (D-genome) are two of closely-related extant progenitors. Gene expression studies in allotetraploid cotton are complicated by the homoeologous loci of A- and D-genome origins. To develop genomic resources for gene expression and cotton breeding, we sequenced and assembled expressed sequence tags (ESTs) derived from G. arboreum and G. raimondii. RESULTS: Roche/454 FLX sequencing technology was employed to sequence normalized cDNA libraries prepared from leaves, roots, bolls, ovules, and fibers in G. arboreum and G. raimondii, respectively. Sequencing reads from two independent libraries in each species were combined to assemble high-quality EST contigs. The combined sequencing reads included 1,699,776 from A-genome and 1,464,815 from D-genome, which were clustered into 89,588 contigs in the A-genome and 65,542 contigs in the D-genome. These contigs represented ~80% of EST collections in Cotton Gene Index 11 (CGI11, March 2011). Compared to the D-genome transcript database, 27,537 and 10,452 contigs were unique transcripts in A and D genomes, respectively. Further analysis using self-blastn reduced the unigene contig number by 52% in A-genome and 57% in D-genome, suggesting that 50% or more of contigs are paralogs or isoforms within each species. The majority of EST contigs (73–81%) were conserved between A- and D-genomes, whereas 27% and 19% contigs were specific to A- and D-genomes, respectively. Using these ESTs, we generated a total of 75,754 genome-specific single nucleotide polymorphism (SNP) (gSNPs or GNPs) or homoeologous-specific SNPs (hSNPs) of 10,885 contigs or genes between A and D genomes, indicating a possibility of separating allelic expression for those genes in allotetraploid cotton. CONCLUSIONS: Expressed genes are highly redundant within each diploid progenitor and between A and D progenitor species, suggesting that diploid progenitors in cotton are likely ancient tetraploids. This large set of A- and D-genome ESTs and GNPs will be valuable resources for genome annotation, gene expression, and crop improvement in allotetraploid cotton. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/1756-0500-7-493) contains supplementary material, which is available to authorized users. BioMed Central 2014-08-06 /pmc/articles/PMC4267057/ /pubmed/25099166 http://dx.doi.org/10.1186/1756-0500-7-493 Text en © Guan et al.; licensee BioMed Central Ltd. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Guan, Xueying
Nah, Gyoungju
Song, Qingxin
Udall, Joshua A
Stelly, David M
Chen, Z Jeffrey
Transcriptome analysis of extant cotton progenitors revealed tetraploidization and identified genome-specific single nucleotide polymorphism in diploid and allotetraploid cotton
title Transcriptome analysis of extant cotton progenitors revealed tetraploidization and identified genome-specific single nucleotide polymorphism in diploid and allotetraploid cotton
title_full Transcriptome analysis of extant cotton progenitors revealed tetraploidization and identified genome-specific single nucleotide polymorphism in diploid and allotetraploid cotton
title_fullStr Transcriptome analysis of extant cotton progenitors revealed tetraploidization and identified genome-specific single nucleotide polymorphism in diploid and allotetraploid cotton
title_full_unstemmed Transcriptome analysis of extant cotton progenitors revealed tetraploidization and identified genome-specific single nucleotide polymorphism in diploid and allotetraploid cotton
title_short Transcriptome analysis of extant cotton progenitors revealed tetraploidization and identified genome-specific single nucleotide polymorphism in diploid and allotetraploid cotton
title_sort transcriptome analysis of extant cotton progenitors revealed tetraploidization and identified genome-specific single nucleotide polymorphism in diploid and allotetraploid cotton
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4267057/
https://www.ncbi.nlm.nih.gov/pubmed/25099166
http://dx.doi.org/10.1186/1756-0500-7-493
work_keys_str_mv AT guanxueying transcriptomeanalysisofextantcottonprogenitorsrevealedtetraploidizationandidentifiedgenomespecificsinglenucleotidepolymorphismindiploidandallotetraploidcotton
AT nahgyoungju transcriptomeanalysisofextantcottonprogenitorsrevealedtetraploidizationandidentifiedgenomespecificsinglenucleotidepolymorphismindiploidandallotetraploidcotton
AT songqingxin transcriptomeanalysisofextantcottonprogenitorsrevealedtetraploidizationandidentifiedgenomespecificsinglenucleotidepolymorphismindiploidandallotetraploidcotton
AT udalljoshuaa transcriptomeanalysisofextantcottonprogenitorsrevealedtetraploidizationandidentifiedgenomespecificsinglenucleotidepolymorphismindiploidandallotetraploidcotton
AT stellydavidm transcriptomeanalysisofextantcottonprogenitorsrevealedtetraploidizationandidentifiedgenomespecificsinglenucleotidepolymorphismindiploidandallotetraploidcotton
AT chenzjeffrey transcriptomeanalysisofextantcottonprogenitorsrevealedtetraploidizationandidentifiedgenomespecificsinglenucleotidepolymorphismindiploidandallotetraploidcotton