Cargando…

Finding type 2 diabetes causal single nucleotide polymorphism combinations and functional modules from genome-wide association data

BACKGROUND: Due to the low statistical power of individual markers from a genome-wide association study (GWAS), detecting causal single nucleotide polymorphisms (SNPs) for complex diseases is a challenge. SNP combinations are suggested to compensate for the low statistical power of individual marker...

Descripción completa

Detalles Bibliográficos
Autores principales:	Kang, Chiyong, Yu, Hyeji, Yi, Gwan-Su
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2013
Materias:	Proceedings
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3618247/ https://www.ncbi.nlm.nih.gov/pubmed/23566118 http://dx.doi.org/10.1186/1472-6947-13-S1-S3

_version_	1782265385832153088
author	Kang, Chiyong Yu, Hyeji Yi, Gwan-Su
author_facet	Kang, Chiyong Yu, Hyeji Yi, Gwan-Su
author_sort	Kang, Chiyong
collection	PubMed
description	BACKGROUND: Due to the low statistical power of individual markers from a genome-wide association study (GWAS), detecting causal single nucleotide polymorphisms (SNPs) for complex diseases is a challenge. SNP combinations are suggested to compensate for the low statistical power of individual markers, but SNP combinations from GWAS generate high computational complexity. METHODS: We aim to detect type 2 diabetes (T2D) causal SNP combinations from a GWAS dataset with optimal filtration and to discover the biological meaning of the detected SNP combinations. Optimal filtration can enhance the statistical power of SNP combinations by comparing the error rates of SNP combinations from various Bonferroni thresholds and p-value range-based thresholds combined with linkage disequilibrium (LD) pruning. T2D causal SNP combinations are selected using random forests with variable selection from an optimal SNP dataset. T2D causal SNP combinations and genome-wide SNPs are mapped into functional modules using expanded gene set enrichment analysis (GSEA) considering pathway, transcription factor (TF)-target, miRNA-target, gene ontology, and protein complex functional modules. The prediction error rates are measured for SNP sets from functional module-based filtration that selects SNPs within functional modules from genome-wide SNPs based expanded GSEA. RESULTS: A T2D causal SNP combination containing 101 SNPs from the Wellcome Trust Case Control Consortium (WTCCC) GWAS dataset are selected using optimal filtration criteria, with an error rate of 10.25%. Matching 101 SNPs with known T2D genes and functional modules reveals the relationships between T2D and SNP combinations. The prediction error rates of SNP sets from functional module-based filtration record no significance compared to the prediction error rates of randomly selected SNP sets and T2D causal SNP combinations from optimal filtration. CONCLUSIONS: We propose a detection method for complex disease causal SNP combinations from an optimal SNP dataset by using random forests with variable selection. Mapping the biological meanings of detected SNP combinations can help uncover complex disease mechanisms.
format	Online Article Text
id	pubmed-3618247
institution	National Center for Biotechnology Information
language	English
publishDate	2013
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-36182472013-04-09 Finding type 2 diabetes causal single nucleotide polymorphism combinations and functional modules from genome-wide association data Kang, Chiyong Yu, Hyeji Yi, Gwan-Su BMC Med Inform Decis Mak Proceedings BACKGROUND: Due to the low statistical power of individual markers from a genome-wide association study (GWAS), detecting causal single nucleotide polymorphisms (SNPs) for complex diseases is a challenge. SNP combinations are suggested to compensate for the low statistical power of individual markers, but SNP combinations from GWAS generate high computational complexity. METHODS: We aim to detect type 2 diabetes (T2D) causal SNP combinations from a GWAS dataset with optimal filtration and to discover the biological meaning of the detected SNP combinations. Optimal filtration can enhance the statistical power of SNP combinations by comparing the error rates of SNP combinations from various Bonferroni thresholds and p-value range-based thresholds combined with linkage disequilibrium (LD) pruning. T2D causal SNP combinations are selected using random forests with variable selection from an optimal SNP dataset. T2D causal SNP combinations and genome-wide SNPs are mapped into functional modules using expanded gene set enrichment analysis (GSEA) considering pathway, transcription factor (TF)-target, miRNA-target, gene ontology, and protein complex functional modules. The prediction error rates are measured for SNP sets from functional module-based filtration that selects SNPs within functional modules from genome-wide SNPs based expanded GSEA. RESULTS: A T2D causal SNP combination containing 101 SNPs from the Wellcome Trust Case Control Consortium (WTCCC) GWAS dataset are selected using optimal filtration criteria, with an error rate of 10.25%. Matching 101 SNPs with known T2D genes and functional modules reveals the relationships between T2D and SNP combinations. The prediction error rates of SNP sets from functional module-based filtration record no significance compared to the prediction error rates of randomly selected SNP sets and T2D causal SNP combinations from optimal filtration. CONCLUSIONS: We propose a detection method for complex disease causal SNP combinations from an optimal SNP dataset by using random forests with variable selection. Mapping the biological meanings of detected SNP combinations can help uncover complex disease mechanisms. BioMed Central 2013-04-05 /pmc/articles/PMC3618247/ /pubmed/23566118 http://dx.doi.org/10.1186/1472-6947-13-S1-S3 Text en Copyright © 2013 Kang et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Proceedings Kang, Chiyong Yu, Hyeji Yi, Gwan-Su Finding type 2 diabetes causal single nucleotide polymorphism combinations and functional modules from genome-wide association data
title	Finding type 2 diabetes causal single nucleotide polymorphism combinations and functional modules from genome-wide association data
title_full	Finding type 2 diabetes causal single nucleotide polymorphism combinations and functional modules from genome-wide association data
title_fullStr	Finding type 2 diabetes causal single nucleotide polymorphism combinations and functional modules from genome-wide association data
title_full_unstemmed	Finding type 2 diabetes causal single nucleotide polymorphism combinations and functional modules from genome-wide association data
title_short	Finding type 2 diabetes causal single nucleotide polymorphism combinations and functional modules from genome-wide association data
title_sort	finding type 2 diabetes causal single nucleotide polymorphism combinations and functional modules from genome-wide association data
topic	Proceedings
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3618247/ https://www.ncbi.nlm.nih.gov/pubmed/23566118 http://dx.doi.org/10.1186/1472-6947-13-S1-S3
work_keys_str_mv	AT kangchiyong findingtype2diabetescausalsinglenucleotidepolymorphismcombinationsandfunctionalmodulesfromgenomewideassociationdata AT yuhyeji findingtype2diabetescausalsinglenucleotidepolymorphismcombinationsandfunctionalmodulesfromgenomewideassociationdata AT yigwansu findingtype2diabetescausalsinglenucleotidepolymorphismcombinationsandfunctionalmodulesfromgenomewideassociationdata

Finding type 2 diabetes causal single nucleotide polymorphism combinations and functional modules from genome-wide association data

Ejemplares similares