Cargando…

FastTagger: an efficient algorithm for genome-wide tag SNP selection using multi-marker linkage disequilibrium

BACKGROUND: Human genome contains millions of common single nucleotide polymorphisms (SNPs) and these SNPs play an important role in understanding the association between genetic variations and human diseases. Many SNPs show correlated genotypes, or linkage disequilibrium (LD), thus it is not necess...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Guimei, Wang, Yue, Wong, Limsoon
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3098109/
https://www.ncbi.nlm.nih.gov/pubmed/20113476
http://dx.doi.org/10.1186/1471-2105-11-66
_version_ 1782203919664939008
author Liu, Guimei
Wang, Yue
Wong, Limsoon
author_facet Liu, Guimei
Wang, Yue
Wong, Limsoon
author_sort Liu, Guimei
collection PubMed
description BACKGROUND: Human genome contains millions of common single nucleotide polymorphisms (SNPs) and these SNPs play an important role in understanding the association between genetic variations and human diseases. Many SNPs show correlated genotypes, or linkage disequilibrium (LD), thus it is not necessary to genotype all SNPs for association study. Many algorithms have been developed to find a small subset of SNPs called tag SNPs that are sufficient to infer all the other SNPs. Algorithms based on the r(2 )LD statistic have gained popularity because r(2 )is directly related to statistical power to detect disease associations. Most of existing r(2 )based algorithms use pairwise LD. Recent studies show that multi-marker LD can help further reduce the number of tag SNPs. However, existing tag SNP selection algorithms based on multi-marker LD are both time-consuming and memory-consuming. They cannot work on chromosomes containing more than 100 k SNPs using length-3 tagging rules. RESULTS: We propose an efficient algorithm called FastTagger to calculate multi-marker tagging rules and select tag SNPs based on multi-marker LD. FastTagger uses several techniques to reduce running time and memory consumption. Our experiment results show that FastTagger is several times faster than existing multi-marker based tag SNP selection algorithms, and it consumes much less memory at the same time. As a result, FastTagger can work on chromosomes containing more than 100 k SNPs using length-3 tagging rules. FastTagger also produces smaller sets of tag SNPs than existing multi-marker based algorithms, and the reduction ratio ranges from 3%-9% when length-3 tagging rules are used. The generated tagging rules can also be used for genotype imputation. We studied the prediction accuracy of individual rules, and the average accuracy is above 96% when r(2 )≥ 0.9. CONCLUSIONS: Generating multi-marker tagging rules is a computation intensive task, and it is the bottleneck of existing multi-marker based tag SNP selection methods. FastTagger is a practical and scalable algorithm to solve this problem.
format Text
id pubmed-3098109
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-30981092011-05-20 FastTagger: an efficient algorithm for genome-wide tag SNP selection using multi-marker linkage disequilibrium Liu, Guimei Wang, Yue Wong, Limsoon BMC Bioinformatics Research Article BACKGROUND: Human genome contains millions of common single nucleotide polymorphisms (SNPs) and these SNPs play an important role in understanding the association between genetic variations and human diseases. Many SNPs show correlated genotypes, or linkage disequilibrium (LD), thus it is not necessary to genotype all SNPs for association study. Many algorithms have been developed to find a small subset of SNPs called tag SNPs that are sufficient to infer all the other SNPs. Algorithms based on the r(2 )LD statistic have gained popularity because r(2 )is directly related to statistical power to detect disease associations. Most of existing r(2 )based algorithms use pairwise LD. Recent studies show that multi-marker LD can help further reduce the number of tag SNPs. However, existing tag SNP selection algorithms based on multi-marker LD are both time-consuming and memory-consuming. They cannot work on chromosomes containing more than 100 k SNPs using length-3 tagging rules. RESULTS: We propose an efficient algorithm called FastTagger to calculate multi-marker tagging rules and select tag SNPs based on multi-marker LD. FastTagger uses several techniques to reduce running time and memory consumption. Our experiment results show that FastTagger is several times faster than existing multi-marker based tag SNP selection algorithms, and it consumes much less memory at the same time. As a result, FastTagger can work on chromosomes containing more than 100 k SNPs using length-3 tagging rules. FastTagger also produces smaller sets of tag SNPs than existing multi-marker based algorithms, and the reduction ratio ranges from 3%-9% when length-3 tagging rules are used. The generated tagging rules can also be used for genotype imputation. We studied the prediction accuracy of individual rules, and the average accuracy is above 96% when r(2 )≥ 0.9. CONCLUSIONS: Generating multi-marker tagging rules is a computation intensive task, and it is the bottleneck of existing multi-marker based tag SNP selection methods. FastTagger is a practical and scalable algorithm to solve this problem. BioMed Central 2010-01-29 /pmc/articles/PMC3098109/ /pubmed/20113476 http://dx.doi.org/10.1186/1471-2105-11-66 Text en Copyright ©2010 Liu et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Liu, Guimei
Wang, Yue
Wong, Limsoon
FastTagger: an efficient algorithm for genome-wide tag SNP selection using multi-marker linkage disequilibrium
title FastTagger: an efficient algorithm for genome-wide tag SNP selection using multi-marker linkage disequilibrium
title_full FastTagger: an efficient algorithm for genome-wide tag SNP selection using multi-marker linkage disequilibrium
title_fullStr FastTagger: an efficient algorithm for genome-wide tag SNP selection using multi-marker linkage disequilibrium
title_full_unstemmed FastTagger: an efficient algorithm for genome-wide tag SNP selection using multi-marker linkage disequilibrium
title_short FastTagger: an efficient algorithm for genome-wide tag SNP selection using multi-marker linkage disequilibrium
title_sort fasttagger: an efficient algorithm for genome-wide tag snp selection using multi-marker linkage disequilibrium
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3098109/
https://www.ncbi.nlm.nih.gov/pubmed/20113476
http://dx.doi.org/10.1186/1471-2105-11-66
work_keys_str_mv AT liuguimei fasttaggeranefficientalgorithmforgenomewidetagsnpselectionusingmultimarkerlinkagedisequilibrium
AT wangyue fasttaggeranefficientalgorithmforgenomewidetagsnpselectionusingmultimarkerlinkagedisequilibrium
AT wonglimsoon fasttaggeranefficientalgorithmforgenomewidetagsnpselectionusingmultimarkerlinkagedisequilibrium