Cargando…

Using Hamming Distance as Information for SNP-Sets Clustering and Testing in Disease Association Studies

The availability of high-throughput genomic data has led to several challenges in recent genetic association studies, including the large number of genetic variants that must be considered and the computational complexity in statistical analyses. Tackling these problems with a marker-set study such...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wang, Charlotte, Kao, Wen-Hsin, Hsiao, Chuhsing Kate
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2015
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4547758/ https://www.ncbi.nlm.nih.gov/pubmed/26302001 http://dx.doi.org/10.1371/journal.pone.0135918

_version_	1782387107620192256
author	Wang, Charlotte Kao, Wen-Hsin Hsiao, Chuhsing Kate
author_facet	Wang, Charlotte Kao, Wen-Hsin Hsiao, Chuhsing Kate
author_sort	Wang, Charlotte
collection	PubMed
description	The availability of high-throughput genomic data has led to several challenges in recent genetic association studies, including the large number of genetic variants that must be considered and the computational complexity in statistical analyses. Tackling these problems with a marker-set study such as SNP-set analysis can be an efficient solution. To construct SNP-sets, we first propose a clustering algorithm, which employs Hamming distance to measure the similarity between strings of SNP genotypes and evaluates whether the given SNPs or SNP-sets should be clustered. A dendrogram can then be constructed based on such distance measure, and the number of clusters can be determined. With the resulting SNP-sets, we next develop an association test HDAT to examine susceptibility to the disease of interest. This proposed test assesses, based on Hamming distance, whether the similarity between a diseased and a normal individual differs from the similarity between two individuals of the same disease status. In our proposed methodology, only genotype information is needed. No inference of haplotypes is required, and SNPs under consideration do not need to locate in nearby regions. The proposed clustering algorithm and association test are illustrated with applications and simulation studies. As compared with other existing methods, the clustering algorithm is faster and better at identifying sets containing SNPs exerting a similar effect. In addition, the simulation studies demonstrated that the proposed test works well for SNP-sets containing a large proportion of neutral SNPs. Furthermore, employing the clustering algorithm before testing a large set of data improves the knowledge in confining the genetic regions for susceptible genetic markers.
format	Online Article Text
id	pubmed-4547758
institution	National Center for Biotechnology Information
language	English
publishDate	2015
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-45477582015-09-01 Using Hamming Distance as Information for SNP-Sets Clustering and Testing in Disease Association Studies Wang, Charlotte Kao, Wen-Hsin Hsiao, Chuhsing Kate PLoS One Research Article The availability of high-throughput genomic data has led to several challenges in recent genetic association studies, including the large number of genetic variants that must be considered and the computational complexity in statistical analyses. Tackling these problems with a marker-set study such as SNP-set analysis can be an efficient solution. To construct SNP-sets, we first propose a clustering algorithm, which employs Hamming distance to measure the similarity between strings of SNP genotypes and evaluates whether the given SNPs or SNP-sets should be clustered. A dendrogram can then be constructed based on such distance measure, and the number of clusters can be determined. With the resulting SNP-sets, we next develop an association test HDAT to examine susceptibility to the disease of interest. This proposed test assesses, based on Hamming distance, whether the similarity between a diseased and a normal individual differs from the similarity between two individuals of the same disease status. In our proposed methodology, only genotype information is needed. No inference of haplotypes is required, and SNPs under consideration do not need to locate in nearby regions. The proposed clustering algorithm and association test are illustrated with applications and simulation studies. As compared with other existing methods, the clustering algorithm is faster and better at identifying sets containing SNPs exerting a similar effect. In addition, the simulation studies demonstrated that the proposed test works well for SNP-sets containing a large proportion of neutral SNPs. Furthermore, employing the clustering algorithm before testing a large set of data improves the knowledge in confining the genetic regions for susceptible genetic markers. Public Library of Science 2015-08-24 /pmc/articles/PMC4547758/ /pubmed/26302001 http://dx.doi.org/10.1371/journal.pone.0135918 Text en © 2015 Wang et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle	Research Article Wang, Charlotte Kao, Wen-Hsin Hsiao, Chuhsing Kate Using Hamming Distance as Information for SNP-Sets Clustering and Testing in Disease Association Studies
title	Using Hamming Distance as Information for SNP-Sets Clustering and Testing in Disease Association Studies
title_full	Using Hamming Distance as Information for SNP-Sets Clustering and Testing in Disease Association Studies
title_fullStr	Using Hamming Distance as Information for SNP-Sets Clustering and Testing in Disease Association Studies
title_full_unstemmed	Using Hamming Distance as Information for SNP-Sets Clustering and Testing in Disease Association Studies
title_short	Using Hamming Distance as Information for SNP-Sets Clustering and Testing in Disease Association Studies
title_sort	using hamming distance as information for snp-sets clustering and testing in disease association studies
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4547758/ https://www.ncbi.nlm.nih.gov/pubmed/26302001 http://dx.doi.org/10.1371/journal.pone.0135918
work_keys_str_mv	AT wangcharlotte usinghammingdistanceasinformationforsnpsetsclusteringandtestingindiseaseassociationstudies AT kaowenhsin usinghammingdistanceasinformationforsnpsetsclusteringandtestingindiseaseassociationstudies AT hsiaochuhsingkate usinghammingdistanceasinformationforsnpsetsclusteringandtestingindiseaseassociationstudies

Using Hamming Distance as Information for SNP-Sets Clustering and Testing in Disease Association Studies

Ejemplares similares