Cargando…

Selecting additional tag SNPs for tolerating missing data in genotyping

BACKGROUND: Recent studies have shown that the patterns of linkage disequilibrium observed in human populations have a block-like structure, and a small subset of SNPs (called tag SNPs) is sufficient to distinguish each pair of haplotype patterns in the block. In reality, some tag SNPs may be missin...

Descripción completa

Detalles Bibliográficos
Autores principales: Huang, Yao-Ting, Zhang, Kui, Chen, Ting, Chao, Kun-Mao
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2005
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1316880/
https://www.ncbi.nlm.nih.gov/pubmed/16259642
http://dx.doi.org/10.1186/1471-2105-6-263
_version_ 1782126401714913280
author Huang, Yao-Ting
Zhang, Kui
Chen, Ting
Chao, Kun-Mao
author_facet Huang, Yao-Ting
Zhang, Kui
Chen, Ting
Chao, Kun-Mao
author_sort Huang, Yao-Ting
collection PubMed
description BACKGROUND: Recent studies have shown that the patterns of linkage disequilibrium observed in human populations have a block-like structure, and a small subset of SNPs (called tag SNPs) is sufficient to distinguish each pair of haplotype patterns in the block. In reality, some tag SNPs may be missing, and we may fail to distinguish two distinct haplotypes due to the ambiguity caused by missing data. RESULTS: We show there exists a subset of SNPs (referred to as robust tag SNPs) which can still distinguish all distinct haplotypes even when some SNPs are missing. The problem of finding minimum robust tag SNPs is shown to be NP-hard. To find robust tag SNPs efficiently, we propose two greedy algorithms and one linear programming relaxation algorithm. The experimental results indicate that (1) the solutions found by these algorithms are quite close to the optimal solution; (2) the genotyping cost saved by using tag SNPs can be as high as 80%; and (3) genotyping additional tag SNPs for tolerating missing data is still cost-effective. CONCLUSION: Genotyping robust tag SNPs is more practical than just genotyping the minimum tag SNPs if we can not avoid the occurrence of missing data. Our theoretical analysis and experimental results show that the performance of our algorithms is not only efficient but the solution found is also close to the optimal solution.
format Text
id pubmed-1316880
institution National Center for Biotechnology Information
language English
publishDate 2005
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-13168802006-01-09 Selecting additional tag SNPs for tolerating missing data in genotyping Huang, Yao-Ting Zhang, Kui Chen, Ting Chao, Kun-Mao BMC Bioinformatics Methodology Article BACKGROUND: Recent studies have shown that the patterns of linkage disequilibrium observed in human populations have a block-like structure, and a small subset of SNPs (called tag SNPs) is sufficient to distinguish each pair of haplotype patterns in the block. In reality, some tag SNPs may be missing, and we may fail to distinguish two distinct haplotypes due to the ambiguity caused by missing data. RESULTS: We show there exists a subset of SNPs (referred to as robust tag SNPs) which can still distinguish all distinct haplotypes even when some SNPs are missing. The problem of finding minimum robust tag SNPs is shown to be NP-hard. To find robust tag SNPs efficiently, we propose two greedy algorithms and one linear programming relaxation algorithm. The experimental results indicate that (1) the solutions found by these algorithms are quite close to the optimal solution; (2) the genotyping cost saved by using tag SNPs can be as high as 80%; and (3) genotyping additional tag SNPs for tolerating missing data is still cost-effective. CONCLUSION: Genotyping robust tag SNPs is more practical than just genotyping the minimum tag SNPs if we can not avoid the occurrence of missing data. Our theoretical analysis and experimental results show that the performance of our algorithms is not only efficient but the solution found is also close to the optimal solution. BioMed Central 2005-11-01 /pmc/articles/PMC1316880/ /pubmed/16259642 http://dx.doi.org/10.1186/1471-2105-6-263 Text en Copyright © 2005 Huang et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Huang, Yao-Ting
Zhang, Kui
Chen, Ting
Chao, Kun-Mao
Selecting additional tag SNPs for tolerating missing data in genotyping
title Selecting additional tag SNPs for tolerating missing data in genotyping
title_full Selecting additional tag SNPs for tolerating missing data in genotyping
title_fullStr Selecting additional tag SNPs for tolerating missing data in genotyping
title_full_unstemmed Selecting additional tag SNPs for tolerating missing data in genotyping
title_short Selecting additional tag SNPs for tolerating missing data in genotyping
title_sort selecting additional tag snps for tolerating missing data in genotyping
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1316880/
https://www.ncbi.nlm.nih.gov/pubmed/16259642
http://dx.doi.org/10.1186/1471-2105-6-263
work_keys_str_mv AT huangyaoting selectingadditionaltagsnpsfortoleratingmissingdataingenotyping
AT zhangkui selectingadditionaltagsnpsfortoleratingmissingdataingenotyping
AT chenting selectingadditionaltagsnpsfortoleratingmissingdataingenotyping
AT chaokunmao selectingadditionaltagsnpsfortoleratingmissingdataingenotyping