Cargando…

High throughput sequencing in mice: a platform comparison identifies a preponderance of cryptic SNPs

BACKGROUND: Allelic variation is the cornerstone of genetically determined differences in gene expression, gene product structure, physiology, and behavior. However, allelic variation, particularly cryptic (unknown or not annotated) variation, is problematic for follow up analyses. Polymorphisms res...

Descripción completa

Detalles Bibliográficos
Autores principales: Walter, Nicole AR, Bottomly, Daniel, Laderas, Ted, Mooney, Michael A, Darakjian, Priscila, Searles, Robert P, Harrington, Christina A, McWeeney, Shannon K, Hitzemann, Robert, Buck, Kari J
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2743714/
https://www.ncbi.nlm.nih.gov/pubmed/19686600
http://dx.doi.org/10.1186/1471-2164-10-379
_version_ 1782171881666772992
author Walter, Nicole AR
Bottomly, Daniel
Laderas, Ted
Mooney, Michael A
Darakjian, Priscila
Searles, Robert P
Harrington, Christina A
McWeeney, Shannon K
Hitzemann, Robert
Buck, Kari J
author_facet Walter, Nicole AR
Bottomly, Daniel
Laderas, Ted
Mooney, Michael A
Darakjian, Priscila
Searles, Robert P
Harrington, Christina A
McWeeney, Shannon K
Hitzemann, Robert
Buck, Kari J
author_sort Walter, Nicole AR
collection PubMed
description BACKGROUND: Allelic variation is the cornerstone of genetically determined differences in gene expression, gene product structure, physiology, and behavior. However, allelic variation, particularly cryptic (unknown or not annotated) variation, is problematic for follow up analyses. Polymorphisms result in a high incidence of false positive and false negative results in hybridization based analyses and hinder the identification of the true variation underlying genetically determined differences in physiology and behavior. Given the proliferation of mouse genetic models (e.g., knockout models, selectively bred lines, heterogeneous stocks derived from standard inbred strains and wild mice) and the wealth of gene expression microarray and phenotypic studies using genetic models, the impact of naturally-occurring polymorphisms on these data is critical. With the advent of next-generation, high-throughput sequencing, we are now in a position to determine to what extent polymorphisms are currently cryptic in such models and their impact on downstream analyses. RESULTS: We sequenced the two most commonly used inbred mouse strains, DBA/2J and C57BL/6J, across a region of chromosome 1 (171.6 – 174.6 megabases) using two next generation high-throughput sequencing platforms: Applied Biosystems (SOLiD) and Illumina (Genome Analyzer). Using the same templates on both platforms, we compared realignments and single nucleotide polymorphism (SNP) detection with an 80 fold average read depth across platforms and samples. While public datasets currently annotate 4,527 SNPs between the two strains in this interval, thorough high-throughput sequencing identified a total of 11,824 SNPs in the interval, including 7,663 new SNPs. Furthermore, we confirmed 40 missense SNPs and discovered 36 new missense SNPs. CONCLUSION: Comparisons utilizing even two of the best characterized mouse genetic models, DBA/2J and C57BL/6J, indicate that more than half of naturally-occurring SNPs remain cryptic. The magnitude of this problem is compounded when using more divergent or poorly annotated genetic models. This warrants full genomic sequencing of the mouse strains used as genetic models.
format Text
id pubmed-2743714
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-27437142009-09-15 High throughput sequencing in mice: a platform comparison identifies a preponderance of cryptic SNPs Walter, Nicole AR Bottomly, Daniel Laderas, Ted Mooney, Michael A Darakjian, Priscila Searles, Robert P Harrington, Christina A McWeeney, Shannon K Hitzemann, Robert Buck, Kari J BMC Genomics Research Article BACKGROUND: Allelic variation is the cornerstone of genetically determined differences in gene expression, gene product structure, physiology, and behavior. However, allelic variation, particularly cryptic (unknown or not annotated) variation, is problematic for follow up analyses. Polymorphisms result in a high incidence of false positive and false negative results in hybridization based analyses and hinder the identification of the true variation underlying genetically determined differences in physiology and behavior. Given the proliferation of mouse genetic models (e.g., knockout models, selectively bred lines, heterogeneous stocks derived from standard inbred strains and wild mice) and the wealth of gene expression microarray and phenotypic studies using genetic models, the impact of naturally-occurring polymorphisms on these data is critical. With the advent of next-generation, high-throughput sequencing, we are now in a position to determine to what extent polymorphisms are currently cryptic in such models and their impact on downstream analyses. RESULTS: We sequenced the two most commonly used inbred mouse strains, DBA/2J and C57BL/6J, across a region of chromosome 1 (171.6 – 174.6 megabases) using two next generation high-throughput sequencing platforms: Applied Biosystems (SOLiD) and Illumina (Genome Analyzer). Using the same templates on both platforms, we compared realignments and single nucleotide polymorphism (SNP) detection with an 80 fold average read depth across platforms and samples. While public datasets currently annotate 4,527 SNPs between the two strains in this interval, thorough high-throughput sequencing identified a total of 11,824 SNPs in the interval, including 7,663 new SNPs. Furthermore, we confirmed 40 missense SNPs and discovered 36 new missense SNPs. CONCLUSION: Comparisons utilizing even two of the best characterized mouse genetic models, DBA/2J and C57BL/6J, indicate that more than half of naturally-occurring SNPs remain cryptic. The magnitude of this problem is compounded when using more divergent or poorly annotated genetic models. This warrants full genomic sequencing of the mouse strains used as genetic models. BioMed Central 2009-08-17 /pmc/articles/PMC2743714/ /pubmed/19686600 http://dx.doi.org/10.1186/1471-2164-10-379 Text en Copyright © 2009 Walter et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Walter, Nicole AR
Bottomly, Daniel
Laderas, Ted
Mooney, Michael A
Darakjian, Priscila
Searles, Robert P
Harrington, Christina A
McWeeney, Shannon K
Hitzemann, Robert
Buck, Kari J
High throughput sequencing in mice: a platform comparison identifies a preponderance of cryptic SNPs
title High throughput sequencing in mice: a platform comparison identifies a preponderance of cryptic SNPs
title_full High throughput sequencing in mice: a platform comparison identifies a preponderance of cryptic SNPs
title_fullStr High throughput sequencing in mice: a platform comparison identifies a preponderance of cryptic SNPs
title_full_unstemmed High throughput sequencing in mice: a platform comparison identifies a preponderance of cryptic SNPs
title_short High throughput sequencing in mice: a platform comparison identifies a preponderance of cryptic SNPs
title_sort high throughput sequencing in mice: a platform comparison identifies a preponderance of cryptic snps
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2743714/
https://www.ncbi.nlm.nih.gov/pubmed/19686600
http://dx.doi.org/10.1186/1471-2164-10-379
work_keys_str_mv AT walternicolear highthroughputsequencinginmiceaplatformcomparisonidentifiesapreponderanceofcrypticsnps
AT bottomlydaniel highthroughputsequencinginmiceaplatformcomparisonidentifiesapreponderanceofcrypticsnps
AT laderasted highthroughputsequencinginmiceaplatformcomparisonidentifiesapreponderanceofcrypticsnps
AT mooneymichaela highthroughputsequencinginmiceaplatformcomparisonidentifiesapreponderanceofcrypticsnps
AT darakjianpriscila highthroughputsequencinginmiceaplatformcomparisonidentifiesapreponderanceofcrypticsnps
AT searlesrobertp highthroughputsequencinginmiceaplatformcomparisonidentifiesapreponderanceofcrypticsnps
AT harringtonchristinaa highthroughputsequencinginmiceaplatformcomparisonidentifiesapreponderanceofcrypticsnps
AT mcweeneyshannonk highthroughputsequencinginmiceaplatformcomparisonidentifiesapreponderanceofcrypticsnps
AT hitzemannrobert highthroughputsequencinginmiceaplatformcomparisonidentifiesapreponderanceofcrypticsnps
AT buckkarij highthroughputsequencinginmiceaplatformcomparisonidentifiesapreponderanceofcrypticsnps