Cargando…
Analysis of genome-wide association study data using the protein knowledge base
BACKGROUND: Genome-wide association studies (GWAS) aim to identify causal variants and genes for complex disease by independently testing a large number of SNP markers for disease association. Although genes have been implicated in these studies, few utilise the multiple-hit model of complex disease...
Autores principales: | , , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2011
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3261104/ https://www.ncbi.nlm.nih.gov/pubmed/22077927 http://dx.doi.org/10.1186/1471-2156-12-98 |
_version_ | 1782221551615082496 |
---|---|
author | Ballouz, Sara Liu, Jason Y Oti, Martin Gaeta, Bruno Fatkin, Diane Bahlo, Melanie Wouters, Merridee A |
author_facet | Ballouz, Sara Liu, Jason Y Oti, Martin Gaeta, Bruno Fatkin, Diane Bahlo, Melanie Wouters, Merridee A |
author_sort | Ballouz, Sara |
collection | PubMed |
description | BACKGROUND: Genome-wide association studies (GWAS) aim to identify causal variants and genes for complex disease by independently testing a large number of SNP markers for disease association. Although genes have been implicated in these studies, few utilise the multiple-hit model of complex disease to identify causal candidates. A major benefit of multi-locus comparison is that it compensates for some shortcomings of current statistical analyses that test the frequency of each SNP in isolation for the phenotype population versus control. RESULTS: Here we developed and benchmarked several protocols for GWAS data analysis using different in-silico gene prediction and prioritisation methodologies. We adopted a high sensitivity approach to the data, using less conservative statistical SNP associations. Multiple gene search spaces, either of fixed-widths or proximity-based, were generated around each SNP marker. We used the candidate disease gene prediction system Gentrepid to identify candidates based on shared biomolecular pathways or domain-based protein homology. Predictions were made either with phenotype-specific known disease genes as input; or without a priori knowledge, by exhaustive comparison of genes in distinct loci. Because Gentrepid uses biomolecular data to find interactions and common features between genes in distinct loci of the search spaces, it takes advantage of the multi-locus aspect of the data. CONCLUSIONS: Results suggest testing multiple SNP-to-gene search spaces compensates for differences in phenotypes, populations and SNP platforms. Surprisingly, domain-based homology information was more informative when benchmarked against gene candidates reported by GWA studies compared to previously determined disease genes, possibly suggesting a larger contribution of gene homologs to complex diseases than Mendelian diseases. |
format | Online Article Text |
id | pubmed-3261104 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2011 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-32611042012-01-19 Analysis of genome-wide association study data using the protein knowledge base Ballouz, Sara Liu, Jason Y Oti, Martin Gaeta, Bruno Fatkin, Diane Bahlo, Melanie Wouters, Merridee A BMC Genet Methodology Article BACKGROUND: Genome-wide association studies (GWAS) aim to identify causal variants and genes for complex disease by independently testing a large number of SNP markers for disease association. Although genes have been implicated in these studies, few utilise the multiple-hit model of complex disease to identify causal candidates. A major benefit of multi-locus comparison is that it compensates for some shortcomings of current statistical analyses that test the frequency of each SNP in isolation for the phenotype population versus control. RESULTS: Here we developed and benchmarked several protocols for GWAS data analysis using different in-silico gene prediction and prioritisation methodologies. We adopted a high sensitivity approach to the data, using less conservative statistical SNP associations. Multiple gene search spaces, either of fixed-widths or proximity-based, were generated around each SNP marker. We used the candidate disease gene prediction system Gentrepid to identify candidates based on shared biomolecular pathways or domain-based protein homology. Predictions were made either with phenotype-specific known disease genes as input; or without a priori knowledge, by exhaustive comparison of genes in distinct loci. Because Gentrepid uses biomolecular data to find interactions and common features between genes in distinct loci of the search spaces, it takes advantage of the multi-locus aspect of the data. CONCLUSIONS: Results suggest testing multiple SNP-to-gene search spaces compensates for differences in phenotypes, populations and SNP platforms. Surprisingly, domain-based homology information was more informative when benchmarked against gene candidates reported by GWA studies compared to previously determined disease genes, possibly suggesting a larger contribution of gene homologs to complex diseases than Mendelian diseases. BioMed Central 2011-11-13 /pmc/articles/PMC3261104/ /pubmed/22077927 http://dx.doi.org/10.1186/1471-2156-12-98 Text en Copyright ©2011 Ballouz et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Methodology Article Ballouz, Sara Liu, Jason Y Oti, Martin Gaeta, Bruno Fatkin, Diane Bahlo, Melanie Wouters, Merridee A Analysis of genome-wide association study data using the protein knowledge base |
title | Analysis of genome-wide association study data using the protein knowledge base |
title_full | Analysis of genome-wide association study data using the protein knowledge base |
title_fullStr | Analysis of genome-wide association study data using the protein knowledge base |
title_full_unstemmed | Analysis of genome-wide association study data using the protein knowledge base |
title_short | Analysis of genome-wide association study data using the protein knowledge base |
title_sort | analysis of genome-wide association study data using the protein knowledge base |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3261104/ https://www.ncbi.nlm.nih.gov/pubmed/22077927 http://dx.doi.org/10.1186/1471-2156-12-98 |
work_keys_str_mv | AT ballouzsara analysisofgenomewideassociationstudydatausingtheproteinknowledgebase AT liujasony analysisofgenomewideassociationstudydatausingtheproteinknowledgebase AT otimartin analysisofgenomewideassociationstudydatausingtheproteinknowledgebase AT gaetabruno analysisofgenomewideassociationstudydatausingtheproteinknowledgebase AT fatkindiane analysisofgenomewideassociationstudydatausingtheproteinknowledgebase AT bahlomelanie analysisofgenomewideassociationstudydatausingtheproteinknowledgebase AT woutersmerrideea analysisofgenomewideassociationstudydatausingtheproteinknowledgebase |