Cargando…

The effects of contig length and depth on the estimation of SNP frequencies, and the relative abundance of SNPs in protein-coding and non-coding transcripts of tiger salamanders (Ambystoma tigrinum)

BACKGROUND: Next-generation sequencing methods have contributed to rapid progress in the fields of genomics and population genetics. Using this high-throughput and cost-effective technology, a number of studies have estimated single nucleotide polymorphism (SNP) frequency by calculating the mean num...

Descripción completa

Detalles Bibliográficos
Autores principales: Eo, Soo Hyung, DeWoody, J Andrew
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3416719/
https://www.ncbi.nlm.nih.gov/pubmed/22716167
http://dx.doi.org/10.1186/1471-2164-13-259
_version_ 1782240430512930816
author Eo, Soo Hyung
DeWoody, J Andrew
author_facet Eo, Soo Hyung
DeWoody, J Andrew
author_sort Eo, Soo Hyung
collection PubMed
description BACKGROUND: Next-generation sequencing methods have contributed to rapid progress in the fields of genomics and population genetics. Using this high-throughput and cost-effective technology, a number of studies have estimated single nucleotide polymorphism (SNP) frequency by calculating the mean number of SNPs per unit sequence length (e.g., mean SNPs/kb). However, both read length and contig depth are highly variable and thus raise doubt about simple methods of SNP frequency estimation. RESULTS: We used 454 pyrosequencing to identify 2,980 putative SNPs in the eastern tiger salamander (Ambystoma tigrinum tigrinum) transcriptome, then constructed analytical models to estimate SNP frequency. The model which considered only contig length (i.e., the method employed in most published papers) was evaluated with very poor likelihood. Our most robust model considered read depth as well as contig length, and was 7.5 × 10(55) times more likely than the length-only model. Using this novel modeling approach, we estimated SNP frequency in protein-coding (mRNA) and non-coding transcripts (e.g., small RNAs). We found little difference in SNP frequency in the contigs, but we found a trend of a higher frequency of SNPs in long contigs representing non-coding transcripts relative to protein-coding transcripts. These results support the hypothesis that long non-coding transcripts are less conserved than long protein-coding transcripts. CONCLUSIONS: A modeling approach (i.e., using multiple model construction and model selection approaches) can be a powerful tool for identifying selection on specific functional sequence groups by comparing the frequency and distribution of polymorphisms.
format Online
Article
Text
id pubmed-3416719
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-34167192012-08-11 The effects of contig length and depth on the estimation of SNP frequencies, and the relative abundance of SNPs in protein-coding and non-coding transcripts of tiger salamanders (Ambystoma tigrinum) Eo, Soo Hyung DeWoody, J Andrew BMC Genomics Research Article BACKGROUND: Next-generation sequencing methods have contributed to rapid progress in the fields of genomics and population genetics. Using this high-throughput and cost-effective technology, a number of studies have estimated single nucleotide polymorphism (SNP) frequency by calculating the mean number of SNPs per unit sequence length (e.g., mean SNPs/kb). However, both read length and contig depth are highly variable and thus raise doubt about simple methods of SNP frequency estimation. RESULTS: We used 454 pyrosequencing to identify 2,980 putative SNPs in the eastern tiger salamander (Ambystoma tigrinum tigrinum) transcriptome, then constructed analytical models to estimate SNP frequency. The model which considered only contig length (i.e., the method employed in most published papers) was evaluated with very poor likelihood. Our most robust model considered read depth as well as contig length, and was 7.5 × 10(55) times more likely than the length-only model. Using this novel modeling approach, we estimated SNP frequency in protein-coding (mRNA) and non-coding transcripts (e.g., small RNAs). We found little difference in SNP frequency in the contigs, but we found a trend of a higher frequency of SNPs in long contigs representing non-coding transcripts relative to protein-coding transcripts. These results support the hypothesis that long non-coding transcripts are less conserved than long protein-coding transcripts. CONCLUSIONS: A modeling approach (i.e., using multiple model construction and model selection approaches) can be a powerful tool for identifying selection on specific functional sequence groups by comparing the frequency and distribution of polymorphisms. BioMed Central 2012-06-20 /pmc/articles/PMC3416719/ /pubmed/22716167 http://dx.doi.org/10.1186/1471-2164-13-259 Text en Copyright ©2012 Eo and DeWoody et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Eo, Soo Hyung
DeWoody, J Andrew
The effects of contig length and depth on the estimation of SNP frequencies, and the relative abundance of SNPs in protein-coding and non-coding transcripts of tiger salamanders (Ambystoma tigrinum)
title The effects of contig length and depth on the estimation of SNP frequencies, and the relative abundance of SNPs in protein-coding and non-coding transcripts of tiger salamanders (Ambystoma tigrinum)
title_full The effects of contig length and depth on the estimation of SNP frequencies, and the relative abundance of SNPs in protein-coding and non-coding transcripts of tiger salamanders (Ambystoma tigrinum)
title_fullStr The effects of contig length and depth on the estimation of SNP frequencies, and the relative abundance of SNPs in protein-coding and non-coding transcripts of tiger salamanders (Ambystoma tigrinum)
title_full_unstemmed The effects of contig length and depth on the estimation of SNP frequencies, and the relative abundance of SNPs in protein-coding and non-coding transcripts of tiger salamanders (Ambystoma tigrinum)
title_short The effects of contig length and depth on the estimation of SNP frequencies, and the relative abundance of SNPs in protein-coding and non-coding transcripts of tiger salamanders (Ambystoma tigrinum)
title_sort effects of contig length and depth on the estimation of snp frequencies, and the relative abundance of snps in protein-coding and non-coding transcripts of tiger salamanders (ambystoma tigrinum)
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3416719/
https://www.ncbi.nlm.nih.gov/pubmed/22716167
http://dx.doi.org/10.1186/1471-2164-13-259
work_keys_str_mv AT eosoohyung theeffectsofcontiglengthanddepthontheestimationofsnpfrequenciesandtherelativeabundanceofsnpsinproteincodingandnoncodingtranscriptsoftigersalamandersambystomatigrinum
AT dewoodyjandrew theeffectsofcontiglengthanddepthontheestimationofsnpfrequenciesandtherelativeabundanceofsnpsinproteincodingandnoncodingtranscriptsoftigersalamandersambystomatigrinum
AT eosoohyung effectsofcontiglengthanddepthontheestimationofsnpfrequenciesandtherelativeabundanceofsnpsinproteincodingandnoncodingtranscriptsoftigersalamandersambystomatigrinum
AT dewoodyjandrew effectsofcontiglengthanddepthontheestimationofsnpfrequenciesandtherelativeabundanceofsnpsinproteincodingandnoncodingtranscriptsoftigersalamandersambystomatigrinum