Cargando…
CGTS: a site-clustering graph based tagSNP selection algorithm in genotype data
BACKGROUND: Recent studies have shown genetic variation is the basis of the genome-wide disease association research. However, due to the high cost on genotyping large number of single nucleotide polymorphisms (SNPs), it is essential to choose a small subset of informative SNPs (tagSNPs), which are...
Autores principales: | , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2009
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2648800/ https://www.ncbi.nlm.nih.gov/pubmed/19208176 http://dx.doi.org/10.1186/1471-2105-10-S1-S71 |
_version_ | 1782164991299813376 |
---|---|
author | Wang, Jun Guo, Mao-zu Wang, Chun-yu |
author_facet | Wang, Jun Guo, Mao-zu Wang, Chun-yu |
author_sort | Wang, Jun |
collection | PubMed |
description | BACKGROUND: Recent studies have shown genetic variation is the basis of the genome-wide disease association research. However, due to the high cost on genotyping large number of single nucleotide polymorphisms (SNPs), it is essential to choose a small subset of informative SNPs (tagSNPs), which are able to capture most variation in a population, to represent the rest SNPs. Several methods have been proposed to find the minimum set of tagSNPs, but most of them still have some disadvantages such as information loss and block-partition limit. RESULTS: This paper proposes a new hybrid method named CGTS which combines the ideas of the clustering and the graph algorithms to select tagSNPs on genotype data. This method aims to maximize the number of the discarding nontagSNPs in the given set. CGTS integrates the information of the LD association and the genotype diversity using the site graphs, discards redundant SNPs using the algorithm based on these graph structures. The clustering algorithm is used to reduce the running time of CGTS. The efficiency of the algorithm and quality of solutions are evaluated on biological data and the comparisons with three popular selecting methods are shown in the paper. CONCLUSION: Our theoretical analysis and experimental results show that our algorithm CGTS is not only more efficient than other methods but also can be get higher accuracy in tagSNP selection. |
format | Text |
id | pubmed-2648800 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2009 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-26488002009-03-03 CGTS: a site-clustering graph based tagSNP selection algorithm in genotype data Wang, Jun Guo, Mao-zu Wang, Chun-yu BMC Bioinformatics Research BACKGROUND: Recent studies have shown genetic variation is the basis of the genome-wide disease association research. However, due to the high cost on genotyping large number of single nucleotide polymorphisms (SNPs), it is essential to choose a small subset of informative SNPs (tagSNPs), which are able to capture most variation in a population, to represent the rest SNPs. Several methods have been proposed to find the minimum set of tagSNPs, but most of them still have some disadvantages such as information loss and block-partition limit. RESULTS: This paper proposes a new hybrid method named CGTS which combines the ideas of the clustering and the graph algorithms to select tagSNPs on genotype data. This method aims to maximize the number of the discarding nontagSNPs in the given set. CGTS integrates the information of the LD association and the genotype diversity using the site graphs, discards redundant SNPs using the algorithm based on these graph structures. The clustering algorithm is used to reduce the running time of CGTS. The efficiency of the algorithm and quality of solutions are evaluated on biological data and the comparisons with three popular selecting methods are shown in the paper. CONCLUSION: Our theoretical analysis and experimental results show that our algorithm CGTS is not only more efficient than other methods but also can be get higher accuracy in tagSNP selection. BioMed Central 2009-01-30 /pmc/articles/PMC2648800/ /pubmed/19208176 http://dx.doi.org/10.1186/1471-2105-10-S1-S71 Text en Copyright © 2009 Wang et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Wang, Jun Guo, Mao-zu Wang, Chun-yu CGTS: a site-clustering graph based tagSNP selection algorithm in genotype data |
title | CGTS: a site-clustering graph based tagSNP selection algorithm in genotype data |
title_full | CGTS: a site-clustering graph based tagSNP selection algorithm in genotype data |
title_fullStr | CGTS: a site-clustering graph based tagSNP selection algorithm in genotype data |
title_full_unstemmed | CGTS: a site-clustering graph based tagSNP selection algorithm in genotype data |
title_short | CGTS: a site-clustering graph based tagSNP selection algorithm in genotype data |
title_sort | cgts: a site-clustering graph based tagsnp selection algorithm in genotype data |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2648800/ https://www.ncbi.nlm.nih.gov/pubmed/19208176 http://dx.doi.org/10.1186/1471-2105-10-S1-S71 |
work_keys_str_mv | AT wangjun cgtsasiteclusteringgraphbasedtagsnpselectionalgorithmingenotypedata AT guomaozu cgtsasiteclusteringgraphbasedtagsnpselectionalgorithmingenotypedata AT wangchunyu cgtsasiteclusteringgraphbasedtagsnpselectionalgorithmingenotypedata |