Cargando…
Automated SNP detection from a large collection of white spruce expressed sequences: contributing factors and approaches for the categorization of SNPs
BACKGROUND: High-throughput genotyping technologies represent a highly efficient way to accelerate genetic mapping and enable association studies. As a first step toward this goal, we aimed to develop a resource of candidate Single Nucleotide Polymorphisms (SNP) in white spruce (Picea glauca [Moench...
Autores principales: | , , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2006
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1557672/ https://www.ncbi.nlm.nih.gov/pubmed/16824208 http://dx.doi.org/10.1186/1471-2164-7-174 |
_version_ | 1782129393156489216 |
---|---|
author | Pavy, Nathalie Parsons, Lee S Paule, Charles MacKay, John Bousquet, Jean |
author_facet | Pavy, Nathalie Parsons, Lee S Paule, Charles MacKay, John Bousquet, Jean |
author_sort | Pavy, Nathalie |
collection | PubMed |
description | BACKGROUND: High-throughput genotyping technologies represent a highly efficient way to accelerate genetic mapping and enable association studies. As a first step toward this goal, we aimed to develop a resource of candidate Single Nucleotide Polymorphisms (SNP) in white spruce (Picea glauca [Moench] Voss), a softwood tree of major economic importance. RESULTS: A white spruce SNP resource encompassing 12,264 SNPs was constructed from a set of 6,459 contigs derived from Expressed Sequence Tags (EST) and by using the bayesian-based statistical software PolyBayes. Several parameters influencing the SNP prediction were analysed including the a priori expected polymorphism, the probability score (P(SNP)), and the contig depth and length. SNP detection in 3' and 5' reads from the same clones revealed a level of inconsistency between overlapping sequences as low as 1%. A subset of 245 predicted SNPs were verified through the independent resequencing of genomic DNA of a genotype also used to prepare cDNA libraries. The validation rate reached a maximum of 85% for SNPs predicted with either P(SNP )≥ 0.95 or ≥ 0.99. A total of 9,310 SNPs were detected by using P(SNP )≥ 0.95 as a criterion. The SNPs were distributed among 3,590 contigs encompassing an array of broad functional categories, with an overall frequency of 1 SNP per 700 nucleotide sites. Experimental and statistical approaches were used to evaluate the proportion of paralogous SNPs, with estimates in the range of 8 to 12%. The 3,789 coding SNPs identified through coding region annotation and ORF prediction, were distributed into 39% nonsynonymous and 61% synonymous substitutions. Overall, there were 0.9 SNP per 1,000 nonsynonymous sites and 5.2 SNPs per 1,000 synonymous sites, for a genome-wide nonsynonymous to synonymous substitution rate ratio (Ka/Ks) of 0.17. CONCLUSION: We integrated the SNP data in the ForestTreeDB database along with functional annotations to provide a tool facilitating the choice of candidate genes for mapping purposes or association studies. |
format | Text |
id | pubmed-1557672 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2006 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-15576722006-08-31 Automated SNP detection from a large collection of white spruce expressed sequences: contributing factors and approaches for the categorization of SNPs Pavy, Nathalie Parsons, Lee S Paule, Charles MacKay, John Bousquet, Jean BMC Genomics Research Article BACKGROUND: High-throughput genotyping technologies represent a highly efficient way to accelerate genetic mapping and enable association studies. As a first step toward this goal, we aimed to develop a resource of candidate Single Nucleotide Polymorphisms (SNP) in white spruce (Picea glauca [Moench] Voss), a softwood tree of major economic importance. RESULTS: A white spruce SNP resource encompassing 12,264 SNPs was constructed from a set of 6,459 contigs derived from Expressed Sequence Tags (EST) and by using the bayesian-based statistical software PolyBayes. Several parameters influencing the SNP prediction were analysed including the a priori expected polymorphism, the probability score (P(SNP)), and the contig depth and length. SNP detection in 3' and 5' reads from the same clones revealed a level of inconsistency between overlapping sequences as low as 1%. A subset of 245 predicted SNPs were verified through the independent resequencing of genomic DNA of a genotype also used to prepare cDNA libraries. The validation rate reached a maximum of 85% for SNPs predicted with either P(SNP )≥ 0.95 or ≥ 0.99. A total of 9,310 SNPs were detected by using P(SNP )≥ 0.95 as a criterion. The SNPs were distributed among 3,590 contigs encompassing an array of broad functional categories, with an overall frequency of 1 SNP per 700 nucleotide sites. Experimental and statistical approaches were used to evaluate the proportion of paralogous SNPs, with estimates in the range of 8 to 12%. The 3,789 coding SNPs identified through coding region annotation and ORF prediction, were distributed into 39% nonsynonymous and 61% synonymous substitutions. Overall, there were 0.9 SNP per 1,000 nonsynonymous sites and 5.2 SNPs per 1,000 synonymous sites, for a genome-wide nonsynonymous to synonymous substitution rate ratio (Ka/Ks) of 0.17. CONCLUSION: We integrated the SNP data in the ForestTreeDB database along with functional annotations to provide a tool facilitating the choice of candidate genes for mapping purposes or association studies. BioMed Central 2006-07-06 /pmc/articles/PMC1557672/ /pubmed/16824208 http://dx.doi.org/10.1186/1471-2164-7-174 Text en Copyright © 2006 Pavy et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Pavy, Nathalie Parsons, Lee S Paule, Charles MacKay, John Bousquet, Jean Automated SNP detection from a large collection of white spruce expressed sequences: contributing factors and approaches for the categorization of SNPs |
title | Automated SNP detection from a large collection of white spruce expressed sequences: contributing factors and approaches for the categorization of SNPs |
title_full | Automated SNP detection from a large collection of white spruce expressed sequences: contributing factors and approaches for the categorization of SNPs |
title_fullStr | Automated SNP detection from a large collection of white spruce expressed sequences: contributing factors and approaches for the categorization of SNPs |
title_full_unstemmed | Automated SNP detection from a large collection of white spruce expressed sequences: contributing factors and approaches for the categorization of SNPs |
title_short | Automated SNP detection from a large collection of white spruce expressed sequences: contributing factors and approaches for the categorization of SNPs |
title_sort | automated snp detection from a large collection of white spruce expressed sequences: contributing factors and approaches for the categorization of snps |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1557672/ https://www.ncbi.nlm.nih.gov/pubmed/16824208 http://dx.doi.org/10.1186/1471-2164-7-174 |
work_keys_str_mv | AT pavynathalie automatedsnpdetectionfromalargecollectionofwhitespruceexpressedsequencescontributingfactorsandapproachesforthecategorizationofsnps AT parsonslees automatedsnpdetectionfromalargecollectionofwhitespruceexpressedsequencescontributingfactorsandapproachesforthecategorizationofsnps AT paulecharles automatedsnpdetectionfromalargecollectionofwhitespruceexpressedsequencescontributingfactorsandapproachesforthecategorizationofsnps AT mackayjohn automatedsnpdetectionfromalargecollectionofwhitespruceexpressedsequencescontributingfactorsandapproachesforthecategorizationofsnps AT bousquetjean automatedsnpdetectionfromalargecollectionofwhitespruceexpressedsequencescontributingfactorsandapproachesforthecategorizationofsnps |