Cargando…

Transcriptomic SNP discovery for custom genotyping arrays: impacts of sequence data, SNP calling method and genotyping technology on the probability of validation success

BACKGROUND: Single nucleotide polymorphism (SNP) discovery is an important goal of many studies. However, the number of ‘putative’ SNPs discovered from a sequence resource may not provide a reliable indication of the number that will successfully validate with a given genotyping technology. For this...

Descripción completa

Detalles Bibliográficos
Autores principales: Humble, Emily, Thorne, Michael A. S., Forcada, Jaume, Hoffman, Joseph I.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5000416/
https://www.ncbi.nlm.nih.gov/pubmed/27562535
http://dx.doi.org/10.1186/s13104-016-2209-x
_version_ 1782450278658736128
author Humble, Emily
Thorne, Michael A. S.
Forcada, Jaume
Hoffman, Joseph I.
author_facet Humble, Emily
Thorne, Michael A. S.
Forcada, Jaume
Hoffman, Joseph I.
author_sort Humble, Emily
collection PubMed
description BACKGROUND: Single nucleotide polymorphism (SNP) discovery is an important goal of many studies. However, the number of ‘putative’ SNPs discovered from a sequence resource may not provide a reliable indication of the number that will successfully validate with a given genotyping technology. For this it may be necessary to account for factors such as the method used for SNP discovery and the type of sequence data from which it originates, suitability of the SNP flanking sequences for probe design, and genomic context. To explore the relative importance of these and other factors, we used Illumina sequencing to augment an existing Roche 454 transcriptome assembly for the Antarctic fur seal (Arctocephalus gazella). We then mapped the raw Illumina reads to the new hybrid transcriptome using BWA and BOWTIE2 before calling SNPs with GATK. The resulting markers were pooled with two existing sets of SNPs called from the original 454 assembly using NEWBLER and SWAP454. Finally, we explored the extent to which SNPs discovered using these four methods overlapped and predicted the corresponding validation outcomes for both Illumina Infinium iSelect HD and Affymetrix Axiom arrays. RESULTS: Collating markers across all discovery methods resulted in a global list of 34,718 SNPs. However, concordance between the methods was surprisingly poor, with only 51.0 % of SNPs being discovered by more than one method and 13.5 % being called from both the 454 and Illumina datasets. Using a predictive modeling approach, we could also show that SNPs called from the Illumina data were on average more likely to successfully validate, as were SNPs called by more than one method. Above and beyond this pattern, predicted validation outcomes were also consistently better for Affymetrix Axiom arrays. CONCLUSIONS: Our results suggest that focusing on SNPs called by more than one method could potentially improve validation outcomes. They also highlight possible differences between alternative genotyping technologies that could be explored in future studies of non-model organisms. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13104-016-2209-x) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-5000416
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-50004162016-08-27 Transcriptomic SNP discovery for custom genotyping arrays: impacts of sequence data, SNP calling method and genotyping technology on the probability of validation success Humble, Emily Thorne, Michael A. S. Forcada, Jaume Hoffman, Joseph I. BMC Res Notes Technical Note BACKGROUND: Single nucleotide polymorphism (SNP) discovery is an important goal of many studies. However, the number of ‘putative’ SNPs discovered from a sequence resource may not provide a reliable indication of the number that will successfully validate with a given genotyping technology. For this it may be necessary to account for factors such as the method used for SNP discovery and the type of sequence data from which it originates, suitability of the SNP flanking sequences for probe design, and genomic context. To explore the relative importance of these and other factors, we used Illumina sequencing to augment an existing Roche 454 transcriptome assembly for the Antarctic fur seal (Arctocephalus gazella). We then mapped the raw Illumina reads to the new hybrid transcriptome using BWA and BOWTIE2 before calling SNPs with GATK. The resulting markers were pooled with two existing sets of SNPs called from the original 454 assembly using NEWBLER and SWAP454. Finally, we explored the extent to which SNPs discovered using these four methods overlapped and predicted the corresponding validation outcomes for both Illumina Infinium iSelect HD and Affymetrix Axiom arrays. RESULTS: Collating markers across all discovery methods resulted in a global list of 34,718 SNPs. However, concordance between the methods was surprisingly poor, with only 51.0 % of SNPs being discovered by more than one method and 13.5 % being called from both the 454 and Illumina datasets. Using a predictive modeling approach, we could also show that SNPs called from the Illumina data were on average more likely to successfully validate, as were SNPs called by more than one method. Above and beyond this pattern, predicted validation outcomes were also consistently better for Affymetrix Axiom arrays. CONCLUSIONS: Our results suggest that focusing on SNPs called by more than one method could potentially improve validation outcomes. They also highlight possible differences between alternative genotyping technologies that could be explored in future studies of non-model organisms. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13104-016-2209-x) contains supplementary material, which is available to authorized users. BioMed Central 2016-08-26 /pmc/articles/PMC5000416/ /pubmed/27562535 http://dx.doi.org/10.1186/s13104-016-2209-x Text en © The Author(s) 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Technical Note
Humble, Emily
Thorne, Michael A. S.
Forcada, Jaume
Hoffman, Joseph I.
Transcriptomic SNP discovery for custom genotyping arrays: impacts of sequence data, SNP calling method and genotyping technology on the probability of validation success
title Transcriptomic SNP discovery for custom genotyping arrays: impacts of sequence data, SNP calling method and genotyping technology on the probability of validation success
title_full Transcriptomic SNP discovery for custom genotyping arrays: impacts of sequence data, SNP calling method and genotyping technology on the probability of validation success
title_fullStr Transcriptomic SNP discovery for custom genotyping arrays: impacts of sequence data, SNP calling method and genotyping technology on the probability of validation success
title_full_unstemmed Transcriptomic SNP discovery for custom genotyping arrays: impacts of sequence data, SNP calling method and genotyping technology on the probability of validation success
title_short Transcriptomic SNP discovery for custom genotyping arrays: impacts of sequence data, SNP calling method and genotyping technology on the probability of validation success
title_sort transcriptomic snp discovery for custom genotyping arrays: impacts of sequence data, snp calling method and genotyping technology on the probability of validation success
topic Technical Note
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5000416/
https://www.ncbi.nlm.nih.gov/pubmed/27562535
http://dx.doi.org/10.1186/s13104-016-2209-x
work_keys_str_mv AT humbleemily transcriptomicsnpdiscoveryforcustomgenotypingarraysimpactsofsequencedatasnpcallingmethodandgenotypingtechnologyontheprobabilityofvalidationsuccess
AT thornemichaelas transcriptomicsnpdiscoveryforcustomgenotypingarraysimpactsofsequencedatasnpcallingmethodandgenotypingtechnologyontheprobabilityofvalidationsuccess
AT forcadajaume transcriptomicsnpdiscoveryforcustomgenotypingarraysimpactsofsequencedatasnpcallingmethodandgenotypingtechnologyontheprobabilityofvalidationsuccess
AT hoffmanjosephi transcriptomicsnpdiscoveryforcustomgenotypingarraysimpactsofsequencedatasnpcallingmethodandgenotypingtechnologyontheprobabilityofvalidationsuccess