Cargando…

High-throughput gene and SNP discovery in Eucalyptus grandis, an uncharacterized genome

BACKGROUND: Benefits from high-throughput sequencing using 454 pyrosequencing technology may be most apparent for species with high societal or economic value but few genomic resources. Rapid means of gene sequence and SNP discovery using this novel sequencing technology provide a set of baseline to...

Descripción completa

Detalles Bibliográficos
Autores principales: Novaes, Evandro, Drost, Derek R, Farmerie, William G, Pappas, Georgios J, Grattapaglia, Dario, Sederoff, Ronald R, Kirst, Matias
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2483731/
https://www.ncbi.nlm.nih.gov/pubmed/18590545
http://dx.doi.org/10.1186/1471-2164-9-312
_version_ 1782158064963551232
author Novaes, Evandro
Drost, Derek R
Farmerie, William G
Pappas, Georgios J
Grattapaglia, Dario
Sederoff, Ronald R
Kirst, Matias
author_facet Novaes, Evandro
Drost, Derek R
Farmerie, William G
Pappas, Georgios J
Grattapaglia, Dario
Sederoff, Ronald R
Kirst, Matias
author_sort Novaes, Evandro
collection PubMed
description BACKGROUND: Benefits from high-throughput sequencing using 454 pyrosequencing technology may be most apparent for species with high societal or economic value but few genomic resources. Rapid means of gene sequence and SNP discovery using this novel sequencing technology provide a set of baseline tools for genome-level research. However, it is questionable how effective the sequencing of large numbers of short reads for species with essentially no prior gene sequence information will support contig assemblies and sequence annotation. RESULTS: With the purpose of generating the first broad survey of gene sequences in Eucalyptus grandis, the most widely planted hardwood tree species, we used 454 technology to sequence and assemble 148 Mbp of expressed sequences (EST). EST sequences were generated from a normalized cDNA pool comprised of multiple tissues and genotypes, promoting discovery of homologues to almost half of Arabidopsis genes, and a comprehensive survey of allelic variation in the transcriptome. By aligning the sequencing reads from multiple genotypes we detected 23,742 SNPs, 83% of which were validated in a sample. Genome-wide nucleotide diversity was estimated for 2,392 contigs using a modified theta (θ) parameter, adapted for measuring genetic diversity from polymorphisms detected by randomly sequencing a multi-genotype cDNA pool. Diversity estimates in non-synonymous nucleotides were on average 4x smaller than in synonymous, suggesting purifying selection. Non-synonymous to synonymous substitutions (Ka/Ks) among 2,001 contigs averaged 0.30 and was skewed to the right, further supporting that most genes are under purifying selection. Comparison of these estimates among contigs identified major functional classes of genes under purifying and diversifying selection in agreement with previous researches. CONCLUSION: In providing an abundance of foundational transcript sequences where limited prior genomic information existed, this work created part of the foundation for the annotation of the E. grandis genome that is being sequenced by the US Department of Energy. In addition we demonstrated that SNPs sampled in large-scale with 454 pyrosequencing can be used to detect evolutionary signatures among genes, providing one of the first genome-wide assessments of nucleotide diversity and Ka/Ks for a non-model plant species.
format Text
id pubmed-2483731
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-24837312008-07-28 High-throughput gene and SNP discovery in Eucalyptus grandis, an uncharacterized genome Novaes, Evandro Drost, Derek R Farmerie, William G Pappas, Georgios J Grattapaglia, Dario Sederoff, Ronald R Kirst, Matias BMC Genomics Research Article BACKGROUND: Benefits from high-throughput sequencing using 454 pyrosequencing technology may be most apparent for species with high societal or economic value but few genomic resources. Rapid means of gene sequence and SNP discovery using this novel sequencing technology provide a set of baseline tools for genome-level research. However, it is questionable how effective the sequencing of large numbers of short reads for species with essentially no prior gene sequence information will support contig assemblies and sequence annotation. RESULTS: With the purpose of generating the first broad survey of gene sequences in Eucalyptus grandis, the most widely planted hardwood tree species, we used 454 technology to sequence and assemble 148 Mbp of expressed sequences (EST). EST sequences were generated from a normalized cDNA pool comprised of multiple tissues and genotypes, promoting discovery of homologues to almost half of Arabidopsis genes, and a comprehensive survey of allelic variation in the transcriptome. By aligning the sequencing reads from multiple genotypes we detected 23,742 SNPs, 83% of which were validated in a sample. Genome-wide nucleotide diversity was estimated for 2,392 contigs using a modified theta (θ) parameter, adapted for measuring genetic diversity from polymorphisms detected by randomly sequencing a multi-genotype cDNA pool. Diversity estimates in non-synonymous nucleotides were on average 4x smaller than in synonymous, suggesting purifying selection. Non-synonymous to synonymous substitutions (Ka/Ks) among 2,001 contigs averaged 0.30 and was skewed to the right, further supporting that most genes are under purifying selection. Comparison of these estimates among contigs identified major functional classes of genes under purifying and diversifying selection in agreement with previous researches. CONCLUSION: In providing an abundance of foundational transcript sequences where limited prior genomic information existed, this work created part of the foundation for the annotation of the E. grandis genome that is being sequenced by the US Department of Energy. In addition we demonstrated that SNPs sampled in large-scale with 454 pyrosequencing can be used to detect evolutionary signatures among genes, providing one of the first genome-wide assessments of nucleotide diversity and Ka/Ks for a non-model plant species. BioMed Central 2008-06-30 /pmc/articles/PMC2483731/ /pubmed/18590545 http://dx.doi.org/10.1186/1471-2164-9-312 Text en Copyright © 2008 Novaes et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Novaes, Evandro
Drost, Derek R
Farmerie, William G
Pappas, Georgios J
Grattapaglia, Dario
Sederoff, Ronald R
Kirst, Matias
High-throughput gene and SNP discovery in Eucalyptus grandis, an uncharacterized genome
title High-throughput gene and SNP discovery in Eucalyptus grandis, an uncharacterized genome
title_full High-throughput gene and SNP discovery in Eucalyptus grandis, an uncharacterized genome
title_fullStr High-throughput gene and SNP discovery in Eucalyptus grandis, an uncharacterized genome
title_full_unstemmed High-throughput gene and SNP discovery in Eucalyptus grandis, an uncharacterized genome
title_short High-throughput gene and SNP discovery in Eucalyptus grandis, an uncharacterized genome
title_sort high-throughput gene and snp discovery in eucalyptus grandis, an uncharacterized genome
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2483731/
https://www.ncbi.nlm.nih.gov/pubmed/18590545
http://dx.doi.org/10.1186/1471-2164-9-312
work_keys_str_mv AT novaesevandro highthroughputgeneandsnpdiscoveryineucalyptusgrandisanuncharacterizedgenome
AT drostderekr highthroughputgeneandsnpdiscoveryineucalyptusgrandisanuncharacterizedgenome
AT farmeriewilliamg highthroughputgeneandsnpdiscoveryineucalyptusgrandisanuncharacterizedgenome
AT pappasgeorgiosj highthroughputgeneandsnpdiscoveryineucalyptusgrandisanuncharacterizedgenome
AT grattapagliadario highthroughputgeneandsnpdiscoveryineucalyptusgrandisanuncharacterizedgenome
AT sederoffronaldr highthroughputgeneandsnpdiscoveryineucalyptusgrandisanuncharacterizedgenome
AT kirstmatias highthroughputgeneandsnpdiscoveryineucalyptusgrandisanuncharacterizedgenome