Cargando…

Analysis of genome-wide association study data using the protein knowledge base

BACKGROUND: Genome-wide association studies (GWAS) aim to identify causal variants and genes for complex disease by independently testing a large number of SNP markers for disease association. Although genes have been implicated in these studies, few utilise the multiple-hit model of complex disease...

Descripción completa

Detalles Bibliográficos
Autores principales: Ballouz, Sara, Liu, Jason Y, Oti, Martin, Gaeta, Bruno, Fatkin, Diane, Bahlo, Melanie, Wouters, Merridee A
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3261104/
https://www.ncbi.nlm.nih.gov/pubmed/22077927
http://dx.doi.org/10.1186/1471-2156-12-98
_version_ 1782221551615082496
author Ballouz, Sara
Liu, Jason Y
Oti, Martin
Gaeta, Bruno
Fatkin, Diane
Bahlo, Melanie
Wouters, Merridee A
author_facet Ballouz, Sara
Liu, Jason Y
Oti, Martin
Gaeta, Bruno
Fatkin, Diane
Bahlo, Melanie
Wouters, Merridee A
author_sort Ballouz, Sara
collection PubMed
description BACKGROUND: Genome-wide association studies (GWAS) aim to identify causal variants and genes for complex disease by independently testing a large number of SNP markers for disease association. Although genes have been implicated in these studies, few utilise the multiple-hit model of complex disease to identify causal candidates. A major benefit of multi-locus comparison is that it compensates for some shortcomings of current statistical analyses that test the frequency of each SNP in isolation for the phenotype population versus control. RESULTS: Here we developed and benchmarked several protocols for GWAS data analysis using different in-silico gene prediction and prioritisation methodologies. We adopted a high sensitivity approach to the data, using less conservative statistical SNP associations. Multiple gene search spaces, either of fixed-widths or proximity-based, were generated around each SNP marker. We used the candidate disease gene prediction system Gentrepid to identify candidates based on shared biomolecular pathways or domain-based protein homology. Predictions were made either with phenotype-specific known disease genes as input; or without a priori knowledge, by exhaustive comparison of genes in distinct loci. Because Gentrepid uses biomolecular data to find interactions and common features between genes in distinct loci of the search spaces, it takes advantage of the multi-locus aspect of the data. CONCLUSIONS: Results suggest testing multiple SNP-to-gene search spaces compensates for differences in phenotypes, populations and SNP platforms. Surprisingly, domain-based homology information was more informative when benchmarked against gene candidates reported by GWA studies compared to previously determined disease genes, possibly suggesting a larger contribution of gene homologs to complex diseases than Mendelian diseases.
format Online
Article
Text
id pubmed-3261104
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-32611042012-01-19 Analysis of genome-wide association study data using the protein knowledge base Ballouz, Sara Liu, Jason Y Oti, Martin Gaeta, Bruno Fatkin, Diane Bahlo, Melanie Wouters, Merridee A BMC Genet Methodology Article BACKGROUND: Genome-wide association studies (GWAS) aim to identify causal variants and genes for complex disease by independently testing a large number of SNP markers for disease association. Although genes have been implicated in these studies, few utilise the multiple-hit model of complex disease to identify causal candidates. A major benefit of multi-locus comparison is that it compensates for some shortcomings of current statistical analyses that test the frequency of each SNP in isolation for the phenotype population versus control. RESULTS: Here we developed and benchmarked several protocols for GWAS data analysis using different in-silico gene prediction and prioritisation methodologies. We adopted a high sensitivity approach to the data, using less conservative statistical SNP associations. Multiple gene search spaces, either of fixed-widths or proximity-based, were generated around each SNP marker. We used the candidate disease gene prediction system Gentrepid to identify candidates based on shared biomolecular pathways or domain-based protein homology. Predictions were made either with phenotype-specific known disease genes as input; or without a priori knowledge, by exhaustive comparison of genes in distinct loci. Because Gentrepid uses biomolecular data to find interactions and common features between genes in distinct loci of the search spaces, it takes advantage of the multi-locus aspect of the data. CONCLUSIONS: Results suggest testing multiple SNP-to-gene search spaces compensates for differences in phenotypes, populations and SNP platforms. Surprisingly, domain-based homology information was more informative when benchmarked against gene candidates reported by GWA studies compared to previously determined disease genes, possibly suggesting a larger contribution of gene homologs to complex diseases than Mendelian diseases. BioMed Central 2011-11-13 /pmc/articles/PMC3261104/ /pubmed/22077927 http://dx.doi.org/10.1186/1471-2156-12-98 Text en Copyright ©2011 Ballouz et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Ballouz, Sara
Liu, Jason Y
Oti, Martin
Gaeta, Bruno
Fatkin, Diane
Bahlo, Melanie
Wouters, Merridee A
Analysis of genome-wide association study data using the protein knowledge base
title Analysis of genome-wide association study data using the protein knowledge base
title_full Analysis of genome-wide association study data using the protein knowledge base
title_fullStr Analysis of genome-wide association study data using the protein knowledge base
title_full_unstemmed Analysis of genome-wide association study data using the protein knowledge base
title_short Analysis of genome-wide association study data using the protein knowledge base
title_sort analysis of genome-wide association study data using the protein knowledge base
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3261104/
https://www.ncbi.nlm.nih.gov/pubmed/22077927
http://dx.doi.org/10.1186/1471-2156-12-98
work_keys_str_mv AT ballouzsara analysisofgenomewideassociationstudydatausingtheproteinknowledgebase
AT liujasony analysisofgenomewideassociationstudydatausingtheproteinknowledgebase
AT otimartin analysisofgenomewideassociationstudydatausingtheproteinknowledgebase
AT gaetabruno analysisofgenomewideassociationstudydatausingtheproteinknowledgebase
AT fatkindiane analysisofgenomewideassociationstudydatausingtheproteinknowledgebase
AT bahlomelanie analysisofgenomewideassociationstudydatausingtheproteinknowledgebase
AT woutersmerrideea analysisofgenomewideassociationstudydatausingtheproteinknowledgebase