Cargando…

Analysis of genome-wide association study data using the protein knowledge base

BACKGROUND: Genome-wide association studies (GWAS) aim to identify causal variants and genes for complex disease by independently testing a large number of SNP markers for disease association. Although genes have been implicated in these studies, few utilise the multiple-hit model of complex disease...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ballouz, Sara, Liu, Jason Y, Oti, Martin, Gaeta, Bruno, Fatkin, Diane, Bahlo, Melanie, Wouters, Merridee A
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2011
Materias:	Methodology Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3261104/ https://www.ncbi.nlm.nih.gov/pubmed/22077927 http://dx.doi.org/10.1186/1471-2156-12-98

_version_	1782221551615082496
author	Ballouz, Sara Liu, Jason Y Oti, Martin Gaeta, Bruno Fatkin, Diane Bahlo, Melanie Wouters, Merridee A
author_facet	Ballouz, Sara Liu, Jason Y Oti, Martin Gaeta, Bruno Fatkin, Diane Bahlo, Melanie Wouters, Merridee A
author_sort	Ballouz, Sara
collection	PubMed
description	BACKGROUND: Genome-wide association studies (GWAS) aim to identify causal variants and genes for complex disease by independently testing a large number of SNP markers for disease association. Although genes have been implicated in these studies, few utilise the multiple-hit model of complex disease to identify causal candidates. A major benefit of multi-locus comparison is that it compensates for some shortcomings of current statistical analyses that test the frequency of each SNP in isolation for the phenotype population versus control. RESULTS: Here we developed and benchmarked several protocols for GWAS data analysis using different in-silico gene prediction and prioritisation methodologies. We adopted a high sensitivity approach to the data, using less conservative statistical SNP associations. Multiple gene search spaces, either of fixed-widths or proximity-based, were generated around each SNP marker. We used the candidate disease gene prediction system Gentrepid to identify candidates based on shared biomolecular pathways or domain-based protein homology. Predictions were made either with phenotype-specific known disease genes as input; or without a priori knowledge, by exhaustive comparison of genes in distinct loci. Because Gentrepid uses biomolecular data to find interactions and common features between genes in distinct loci of the search spaces, it takes advantage of the multi-locus aspect of the data. CONCLUSIONS: Results suggest testing multiple SNP-to-gene search spaces compensates for differences in phenotypes, populations and SNP platforms. Surprisingly, domain-based homology information was more informative when benchmarked against gene candidates reported by GWA studies compared to previously determined disease genes, possibly suggesting a larger contribution of gene homologs to complex diseases than Mendelian diseases.
format	Online Article Text
id	pubmed-3261104
institution	National Center for Biotechnology Information
language	English
publishDate	2011
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-32611042012-01-19 Analysis of genome-wide association study data using the protein knowledge base Ballouz, Sara Liu, Jason Y Oti, Martin Gaeta, Bruno Fatkin, Diane Bahlo, Melanie Wouters, Merridee A BMC Genet Methodology Article BACKGROUND: Genome-wide association studies (GWAS) aim to identify causal variants and genes for complex disease by independently testing a large number of SNP markers for disease association. Although genes have been implicated in these studies, few utilise the multiple-hit model of complex disease to identify causal candidates. A major benefit of multi-locus comparison is that it compensates for some shortcomings of current statistical analyses that test the frequency of each SNP in isolation for the phenotype population versus control. RESULTS: Here we developed and benchmarked several protocols for GWAS data analysis using different in-silico gene prediction and prioritisation methodologies. We adopted a high sensitivity approach to the data, using less conservative statistical SNP associations. Multiple gene search spaces, either of fixed-widths or proximity-based, were generated around each SNP marker. We used the candidate disease gene prediction system Gentrepid to identify candidates based on shared biomolecular pathways or domain-based protein homology. Predictions were made either with phenotype-specific known disease genes as input; or without a priori knowledge, by exhaustive comparison of genes in distinct loci. Because Gentrepid uses biomolecular data to find interactions and common features between genes in distinct loci of the search spaces, it takes advantage of the multi-locus aspect of the data. CONCLUSIONS: Results suggest testing multiple SNP-to-gene search spaces compensates for differences in phenotypes, populations and SNP platforms. Surprisingly, domain-based homology information was more informative when benchmarked against gene candidates reported by GWA studies compared to previously determined disease genes, possibly suggesting a larger contribution of gene homologs to complex diseases than Mendelian diseases. BioMed Central 2011-11-13 /pmc/articles/PMC3261104/ /pubmed/22077927 http://dx.doi.org/10.1186/1471-2156-12-98 Text en Copyright ©2011 Ballouz et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Article Ballouz, Sara Liu, Jason Y Oti, Martin Gaeta, Bruno Fatkin, Diane Bahlo, Melanie Wouters, Merridee A Analysis of genome-wide association study data using the protein knowledge base
title	Analysis of genome-wide association study data using the protein knowledge base
title_full	Analysis of genome-wide association study data using the protein knowledge base
title_fullStr	Analysis of genome-wide association study data using the protein knowledge base
title_full_unstemmed	Analysis of genome-wide association study data using the protein knowledge base
title_short	Analysis of genome-wide association study data using the protein knowledge base
title_sort	analysis of genome-wide association study data using the protein knowledge base
topic	Methodology Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3261104/ https://www.ncbi.nlm.nih.gov/pubmed/22077927 http://dx.doi.org/10.1186/1471-2156-12-98
work_keys_str_mv	AT ballouzsara analysisofgenomewideassociationstudydatausingtheproteinknowledgebase AT liujasony analysisofgenomewideassociationstudydatausingtheproteinknowledgebase AT otimartin analysisofgenomewideassociationstudydatausingtheproteinknowledgebase AT gaetabruno analysisofgenomewideassociationstudydatausingtheproteinknowledgebase AT fatkindiane analysisofgenomewideassociationstudydatausingtheproteinknowledgebase AT bahlomelanie analysisofgenomewideassociationstudydatausingtheproteinknowledgebase AT woutersmerrideea analysisofgenomewideassociationstudydatausingtheproteinknowledgebase

Analysis of genome-wide association study data using the protein knowledge base

Ejemplares similares