Cargando…

A model-based approach to selection of tag SNPs

BACKGROUND: Single Nucleotide Polymorphisms (SNPs) are the most common type of polymorphisms found in the human genome. Effective genetic association studies require the identification of sets of tag SNPs that capture as much haplotype information as possible. Tag SNP selection is analogous to the p...

Descripción completa

Detalles Bibliográficos
Autores principales:	Nicolas, Pierre, Sun, Fengzhu, Li, Lei M
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2006
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1525207/ https://www.ncbi.nlm.nih.gov/pubmed/16776821 http://dx.doi.org/10.1186/1471-2105-7-303

_version_	1782128887886512128
author	Nicolas, Pierre Sun, Fengzhu Li, Lei M
author_facet	Nicolas, Pierre Sun, Fengzhu Li, Lei M
author_sort	Nicolas, Pierre
collection	PubMed
description	BACKGROUND: Single Nucleotide Polymorphisms (SNPs) are the most common type of polymorphisms found in the human genome. Effective genetic association studies require the identification of sets of tag SNPs that capture as much haplotype information as possible. Tag SNP selection is analogous to the problem of data compression in information theory. According to Shannon's framework, the optimal tag set maximizes the entropy of the tag SNPs subject to constraints on the number of SNPs. This approach requires an appropriate probabilistic model. Compared to simple measures of Linkage Disequilibrium (LD), a good model of haplotype sequences can more accurately account for LD structure. It also provides a machinery for the prediction of tagged SNPs and thereby to assess the performances of tag sets through their ability to predict larger SNP sets. RESULTS: Here, we compute the description code-lengths of SNP data for an array of models and we develop tag SNP selection methods based on these models and the strategy of entropy maximization. Using data sets from the HapMap and ENCODE projects, we show that the hidden Markov model introduced by Li and Stephens outperforms the other models in several aspects: description code-length of SNP data, information content of tag sets, and prediction of tagged SNPs. This is the first use of this model in the context of tag SNP selection. CONCLUSION: Our study provides strong evidence that the tag sets selected by our best method, based on Li and Stephens model, outperform those chosen by several existing methods. The results also suggest that information content evaluated with a good model is more sensitive for assessing the quality of a tagging set than the correct prediction rate of tagged SNPs. Besides, we show that haplotype phase uncertainty has an almost negligible impact on the ability of good tag sets to predict tagged SNPs. This justifies the selection of tag SNPs on the basis of haplotype informativeness, although genotyping studies do not directly assess haplotypes. A software that implements our approach is available.
format	Text
id	pubmed-1525207
institution	National Center for Biotechnology Information
language	English
publishDate	2006
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-15252072006-08-07 A model-based approach to selection of tag SNPs Nicolas, Pierre Sun, Fengzhu Li, Lei M BMC Bioinformatics Methodology Article BACKGROUND: Single Nucleotide Polymorphisms (SNPs) are the most common type of polymorphisms found in the human genome. Effective genetic association studies require the identification of sets of tag SNPs that capture as much haplotype information as possible. Tag SNP selection is analogous to the problem of data compression in information theory. According to Shannon's framework, the optimal tag set maximizes the entropy of the tag SNPs subject to constraints on the number of SNPs. This approach requires an appropriate probabilistic model. Compared to simple measures of Linkage Disequilibrium (LD), a good model of haplotype sequences can more accurately account for LD structure. It also provides a machinery for the prediction of tagged SNPs and thereby to assess the performances of tag sets through their ability to predict larger SNP sets. RESULTS: Here, we compute the description code-lengths of SNP data for an array of models and we develop tag SNP selection methods based on these models and the strategy of entropy maximization. Using data sets from the HapMap and ENCODE projects, we show that the hidden Markov model introduced by Li and Stephens outperforms the other models in several aspects: description code-length of SNP data, information content of tag sets, and prediction of tagged SNPs. This is the first use of this model in the context of tag SNP selection. CONCLUSION: Our study provides strong evidence that the tag sets selected by our best method, based on Li and Stephens model, outperform those chosen by several existing methods. The results also suggest that information content evaluated with a good model is more sensitive for assessing the quality of a tagging set than the correct prediction rate of tagged SNPs. Besides, we show that haplotype phase uncertainty has an almost negligible impact on the ability of good tag sets to predict tagged SNPs. This justifies the selection of tag SNPs on the basis of haplotype informativeness, although genotyping studies do not directly assess haplotypes. A software that implements our approach is available. BioMed Central 2006-06-15 /pmc/articles/PMC1525207/ /pubmed/16776821 http://dx.doi.org/10.1186/1471-2105-7-303 Text en Copyright © 2006 Nicolas et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Nicolas, Pierre Sun, Fengzhu Li, Lei M A model-based approach to selection of tag SNPs
title	A model-based approach to selection of tag SNPs
title_full	A model-based approach to selection of tag SNPs
title_fullStr	A model-based approach to selection of tag SNPs
title_full_unstemmed	A model-based approach to selection of tag SNPs
title_short	A model-based approach to selection of tag SNPs
title_sort	model-based approach to selection of tag snps
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1525207/ https://www.ncbi.nlm.nih.gov/pubmed/16776821 http://dx.doi.org/10.1186/1471-2105-7-303
work_keys_str_mv	AT nicolaspierre amodelbasedapproachtoselectionoftagsnps AT sunfengzhu amodelbasedapproachtoselectionoftagsnps AT lileim amodelbasedapproachtoselectionoftagsnps AT nicolaspierre modelbasedapproachtoselectionoftagsnps AT sunfengzhu modelbasedapproachtoselectionoftagsnps AT lileim modelbasedapproachtoselectionoftagsnps

A model-based approach to selection of tag SNPs

Ejemplares similares