Cargando…

Identification of single nucleotide polymorphisms from the transcriptome of an organism with a whole genome duplication

BACKGROUND: The common ancestor of salmonid fishes, including rainbow trout (Oncorhynchus mykiss), experienced a whole genome duplication between 20 and 100 million years ago, and many of the duplicated genes have been retained in the trout genome. This retention complicates efforts to detect alleli...

Descripción completa

Detalles Bibliográficos
Autores principales: Christensen, Kris A, Brunelli, Joseph P, Lambert, Matthew J, DeKoning, Jenefer, Phillips, Ruth B, Thorgaard, Gary H
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3840595/
https://www.ncbi.nlm.nih.gov/pubmed/24237905
http://dx.doi.org/10.1186/1471-2105-14-325
_version_ 1782478531012329472
author Christensen, Kris A
Brunelli, Joseph P
Lambert, Matthew J
DeKoning, Jenefer
Phillips, Ruth B
Thorgaard, Gary H
author_facet Christensen, Kris A
Brunelli, Joseph P
Lambert, Matthew J
DeKoning, Jenefer
Phillips, Ruth B
Thorgaard, Gary H
author_sort Christensen, Kris A
collection PubMed
description BACKGROUND: The common ancestor of salmonid fishes, including rainbow trout (Oncorhynchus mykiss), experienced a whole genome duplication between 20 and 100 million years ago, and many of the duplicated genes have been retained in the trout genome. This retention complicates efforts to detect allelic variation in salmonid fishes. Specifically, single nucleotide polymorphism (SNP) detection is problematic because nucleotide variation can be found between the duplicate copies (paralogs) of a gene as well as between alleles. RESULTS: We present a method of differentiating between allelic and paralogous (gene copy) sequence variants, allowing identification of SNPs in organisms with multiple copies of a gene or set of genes. The basic strategy is to: 1) identify windows of unique cDNA sequences with homology to each other, 2) compare these unique cDNAs if they are not shared between individuals (i.e. the cDNA is homozygous in one individual and homozygous for another cDNA in the other individual), and 3) give a “SNP score” value between zero and one to each candidate sequence variant based on six criteria. Using this strategy we were able to detect about seven thousand potential SNPs from the transcriptomes of several clonal lines of rainbow trout. When directly compared to a pre-validated set of SNPs in polyploid wheat, we were also able to estimate the false-positive rate of this strategy as 0 to 28% depending on parameters used. CONCLUSIONS: This strategy has an advantage over traditional techniques of SNP identification because another dimension of sequencing information is utilized. This method is especially well suited for identifying SNPs in polyploids, both outbred and inbred, but would tend to be conservative for diploid organisms.
format Online
Article
Text
id pubmed-3840595
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-38405952013-11-27 Identification of single nucleotide polymorphisms from the transcriptome of an organism with a whole genome duplication Christensen, Kris A Brunelli, Joseph P Lambert, Matthew J DeKoning, Jenefer Phillips, Ruth B Thorgaard, Gary H BMC Bioinformatics Methodology Article BACKGROUND: The common ancestor of salmonid fishes, including rainbow trout (Oncorhynchus mykiss), experienced a whole genome duplication between 20 and 100 million years ago, and many of the duplicated genes have been retained in the trout genome. This retention complicates efforts to detect allelic variation in salmonid fishes. Specifically, single nucleotide polymorphism (SNP) detection is problematic because nucleotide variation can be found between the duplicate copies (paralogs) of a gene as well as between alleles. RESULTS: We present a method of differentiating between allelic and paralogous (gene copy) sequence variants, allowing identification of SNPs in organisms with multiple copies of a gene or set of genes. The basic strategy is to: 1) identify windows of unique cDNA sequences with homology to each other, 2) compare these unique cDNAs if they are not shared between individuals (i.e. the cDNA is homozygous in one individual and homozygous for another cDNA in the other individual), and 3) give a “SNP score” value between zero and one to each candidate sequence variant based on six criteria. Using this strategy we were able to detect about seven thousand potential SNPs from the transcriptomes of several clonal lines of rainbow trout. When directly compared to a pre-validated set of SNPs in polyploid wheat, we were also able to estimate the false-positive rate of this strategy as 0 to 28% depending on parameters used. CONCLUSIONS: This strategy has an advantage over traditional techniques of SNP identification because another dimension of sequencing information is utilized. This method is especially well suited for identifying SNPs in polyploids, both outbred and inbred, but would tend to be conservative for diploid organisms. BioMed Central 2013-11-16 /pmc/articles/PMC3840595/ /pubmed/24237905 http://dx.doi.org/10.1186/1471-2105-14-325 Text en Copyright © 2013 Christensen et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Christensen, Kris A
Brunelli, Joseph P
Lambert, Matthew J
DeKoning, Jenefer
Phillips, Ruth B
Thorgaard, Gary H
Identification of single nucleotide polymorphisms from the transcriptome of an organism with a whole genome duplication
title Identification of single nucleotide polymorphisms from the transcriptome of an organism with a whole genome duplication
title_full Identification of single nucleotide polymorphisms from the transcriptome of an organism with a whole genome duplication
title_fullStr Identification of single nucleotide polymorphisms from the transcriptome of an organism with a whole genome duplication
title_full_unstemmed Identification of single nucleotide polymorphisms from the transcriptome of an organism with a whole genome duplication
title_short Identification of single nucleotide polymorphisms from the transcriptome of an organism with a whole genome duplication
title_sort identification of single nucleotide polymorphisms from the transcriptome of an organism with a whole genome duplication
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3840595/
https://www.ncbi.nlm.nih.gov/pubmed/24237905
http://dx.doi.org/10.1186/1471-2105-14-325
work_keys_str_mv AT christensenkrisa identificationofsinglenucleotidepolymorphismsfromthetranscriptomeofanorganismwithawholegenomeduplication
AT brunellijosephp identificationofsinglenucleotidepolymorphismsfromthetranscriptomeofanorganismwithawholegenomeduplication
AT lambertmatthewj identificationofsinglenucleotidepolymorphismsfromthetranscriptomeofanorganismwithawholegenomeduplication
AT dekoningjenefer identificationofsinglenucleotidepolymorphismsfromthetranscriptomeofanorganismwithawholegenomeduplication
AT phillipsruthb identificationofsinglenucleotidepolymorphismsfromthetranscriptomeofanorganismwithawholegenomeduplication
AT thorgaardgaryh identificationofsinglenucleotidepolymorphismsfromthetranscriptomeofanorganismwithawholegenomeduplication