Cargando…

Application of massive parallel sequencing to whole genome SNP discovery in the porcine genome

BACKGROUND: Although the Illumina 1 G Genome Analyzer generates billions of base pairs of sequence data, challenges arise in sequence selection due to the varying sequence quality. Therefore, in the framework of the International Porcine SNP Chip Consortium, this pilot study aimed to evaluate the im...

Descripción completa

Detalles Bibliográficos
Autores principales: Amaral, Andreia J, Megens, Hendrik-Jan, Kerstens, Hindrik HD, Heuven, Henri CM, Dibbits, Bert, Crooijmans, Richard PMA, den Dunnen, Johan T, Groenen, Martien AM
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2739861/
https://www.ncbi.nlm.nih.gov/pubmed/19674453
http://dx.doi.org/10.1186/1471-2164-10-374
_version_ 1782171622741901312
author Amaral, Andreia J
Megens, Hendrik-Jan
Kerstens, Hindrik HD
Heuven, Henri CM
Dibbits, Bert
Crooijmans, Richard PMA
den Dunnen, Johan T
Groenen, Martien AM
author_facet Amaral, Andreia J
Megens, Hendrik-Jan
Kerstens, Hindrik HD
Heuven, Henri CM
Dibbits, Bert
Crooijmans, Richard PMA
den Dunnen, Johan T
Groenen, Martien AM
author_sort Amaral, Andreia J
collection PubMed
description BACKGROUND: Although the Illumina 1 G Genome Analyzer generates billions of base pairs of sequence data, challenges arise in sequence selection due to the varying sequence quality. Therefore, in the framework of the International Porcine SNP Chip Consortium, this pilot study aimed to evaluate the impact of the quality level of the sequenced bases on mapping quality and identification of true SNPs on a large scale. RESULTS: DNA pooled from five animals from a commercial boar line was digested with DraI; 150–250-bp fragments were isolated and end-sequenced using the Illumina 1 G Genome Analyzer, yielding 70,348,064 sequences 36-bp long. Rules were developed to select sequences, which were then aligned to unique positions in a reference genome. Sequences were selected based on quality, and three thresholds of sequence quality (SQ) were compared. The highest threshold of SQ allowed identification of a larger number of SNPs (17,489), distributed widely across the pig genome. In total, 3,142 SNPs were validated with a success rate of 96%. The correlation between estimated minor allele frequency (MAF) and genotyped MAF was moderate, and SNPs were highly polymorphic in other pig breeds. Lowering the SQ threshold and maintaining the same criteria for SNP identification resulted in the discovery of fewer SNPs (16,768), of which 259 were not identified using higher SQ levels. Validation of SNPs found exclusively in the lower SQ threshold had a success rate of 94% and a low correlation between estimated MAF and genotyped MAF. Base change analysis suggested that the rate of transitions in the pig genome is likely to be similar to that observed in humans. Chromosome X showed reduced nucleotide diversity relative to autosomes, as observed for other species. CONCLUSION: Large numbers of SNPs can be identified reliably by creating strict rules for sequence selection, which simultaneously decreases sequence ambiguity. Selection of sequences using a higher SQ threshold leads to more reliable identification of SNPs. Lower SQ thresholds can be used to guarantee sufficient sequence coverage, resulting in high success rate but less reliable MAF estimation. Nucleotide diversity varies between porcine chromosomes, with the X chromosome showing less variation as observed in other species.
format Text
id pubmed-2739861
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-27398612009-09-09 Application of massive parallel sequencing to whole genome SNP discovery in the porcine genome Amaral, Andreia J Megens, Hendrik-Jan Kerstens, Hindrik HD Heuven, Henri CM Dibbits, Bert Crooijmans, Richard PMA den Dunnen, Johan T Groenen, Martien AM BMC Genomics Methodology Article BACKGROUND: Although the Illumina 1 G Genome Analyzer generates billions of base pairs of sequence data, challenges arise in sequence selection due to the varying sequence quality. Therefore, in the framework of the International Porcine SNP Chip Consortium, this pilot study aimed to evaluate the impact of the quality level of the sequenced bases on mapping quality and identification of true SNPs on a large scale. RESULTS: DNA pooled from five animals from a commercial boar line was digested with DraI; 150–250-bp fragments were isolated and end-sequenced using the Illumina 1 G Genome Analyzer, yielding 70,348,064 sequences 36-bp long. Rules were developed to select sequences, which were then aligned to unique positions in a reference genome. Sequences were selected based on quality, and three thresholds of sequence quality (SQ) were compared. The highest threshold of SQ allowed identification of a larger number of SNPs (17,489), distributed widely across the pig genome. In total, 3,142 SNPs were validated with a success rate of 96%. The correlation between estimated minor allele frequency (MAF) and genotyped MAF was moderate, and SNPs were highly polymorphic in other pig breeds. Lowering the SQ threshold and maintaining the same criteria for SNP identification resulted in the discovery of fewer SNPs (16,768), of which 259 were not identified using higher SQ levels. Validation of SNPs found exclusively in the lower SQ threshold had a success rate of 94% and a low correlation between estimated MAF and genotyped MAF. Base change analysis suggested that the rate of transitions in the pig genome is likely to be similar to that observed in humans. Chromosome X showed reduced nucleotide diversity relative to autosomes, as observed for other species. CONCLUSION: Large numbers of SNPs can be identified reliably by creating strict rules for sequence selection, which simultaneously decreases sequence ambiguity. Selection of sequences using a higher SQ threshold leads to more reliable identification of SNPs. Lower SQ thresholds can be used to guarantee sufficient sequence coverage, resulting in high success rate but less reliable MAF estimation. Nucleotide diversity varies between porcine chromosomes, with the X chromosome showing less variation as observed in other species. BioMed Central 2009-08-12 /pmc/articles/PMC2739861/ /pubmed/19674453 http://dx.doi.org/10.1186/1471-2164-10-374 Text en Copyright © 2009 Amaral et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Amaral, Andreia J
Megens, Hendrik-Jan
Kerstens, Hindrik HD
Heuven, Henri CM
Dibbits, Bert
Crooijmans, Richard PMA
den Dunnen, Johan T
Groenen, Martien AM
Application of massive parallel sequencing to whole genome SNP discovery in the porcine genome
title Application of massive parallel sequencing to whole genome SNP discovery in the porcine genome
title_full Application of massive parallel sequencing to whole genome SNP discovery in the porcine genome
title_fullStr Application of massive parallel sequencing to whole genome SNP discovery in the porcine genome
title_full_unstemmed Application of massive parallel sequencing to whole genome SNP discovery in the porcine genome
title_short Application of massive parallel sequencing to whole genome SNP discovery in the porcine genome
title_sort application of massive parallel sequencing to whole genome snp discovery in the porcine genome
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2739861/
https://www.ncbi.nlm.nih.gov/pubmed/19674453
http://dx.doi.org/10.1186/1471-2164-10-374
work_keys_str_mv AT amaralandreiaj applicationofmassiveparallelsequencingtowholegenomesnpdiscoveryintheporcinegenome
AT megenshendrikjan applicationofmassiveparallelsequencingtowholegenomesnpdiscoveryintheporcinegenome
AT kerstenshindrikhd applicationofmassiveparallelsequencingtowholegenomesnpdiscoveryintheporcinegenome
AT heuvenhenricm applicationofmassiveparallelsequencingtowholegenomesnpdiscoveryintheporcinegenome
AT dibbitsbert applicationofmassiveparallelsequencingtowholegenomesnpdiscoveryintheporcinegenome
AT crooijmansrichardpma applicationofmassiveparallelsequencingtowholegenomesnpdiscoveryintheporcinegenome
AT dendunnenjohant applicationofmassiveparallelsequencingtowholegenomesnpdiscoveryintheporcinegenome
AT groenenmartienam applicationofmassiveparallelsequencingtowholegenomesnpdiscoveryintheporcinegenome