Cargando…

An Open Access Database of Genome-wide Association Results

BACKGROUND: The number of genome-wide association studies (GWAS) is growing rapidly leading to the discovery and replication of many new disease loci. Combining results from multiple GWAS datasets may potentially strengthen previous conclusions and suggest new disease loci, pathways or pleiotropic g...

Descripción completa

Detalles Bibliográficos
Autores principales: Johnson, Andrew D, O'Donnell, Christopher J
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2009
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2639349/
https://www.ncbi.nlm.nih.gov/pubmed/19161620
http://dx.doi.org/10.1186/1471-2350-10-6
_version_ 1782164455395688448
author Johnson, Andrew D
O'Donnell, Christopher J
author_facet Johnson, Andrew D
O'Donnell, Christopher J
author_sort Johnson, Andrew D
collection PubMed
description BACKGROUND: The number of genome-wide association studies (GWAS) is growing rapidly leading to the discovery and replication of many new disease loci. Combining results from multiple GWAS datasets may potentially strengthen previous conclusions and suggest new disease loci, pathways or pleiotropic genes. However, no database or centralized resource currently exists that contains anywhere near the full scope of GWAS results. METHODS: We collected available results from 118 GWAS articles into a database of 56,411 significant SNP-phenotype associations and accompanying information, making this database freely available here. In doing so, we met and describe here a number of challenges to creating an open access database of GWAS results. Through preliminary analyses and characterization of available GWAS, we demonstrate the potential to gain new insights by querying a database across GWAS. RESULTS: Using a genomic bin-based density analysis to search for highly associated regions of the genome, positive control loci (e.g., MHC loci) were detected with high sensitivity. Likewise, an analysis of highly repeated SNPs across GWAS identified replicated loci (e.g., APOE, LPL). At the same time we identified novel, highly suggestive loci for a variety of traits that did not meet genome-wide significant thresholds in prior analyses, in some cases with strong support from the primary medical genetics literature (SLC16A7, CSMD1, OAS1), suggesting these genes merit further study. Additional adjustment for linkage disequilibrium within most regions with a high density of GWAS associations did not materially alter our findings. Having a centralized database with standardized gene annotation also allowed us to examine the representation of functional gene categories (gene ontologies) containing one or more associations among top GWAS results. Genes relating to cell adhesion functions were highly over-represented among significant associations (p < 4.6 × 10(-14)), a finding which was not perturbed by a sensitivity analysis. CONCLUSION: We provide access to a full gene-annotated GWAS database which could be used for further querying, analyses or integration with other genomic information. We make a number of general observations. Of reported associated SNPs, 40% lie within the boundaries of a RefSeq gene and 68% are within 60 kb of one, indicating a bias toward gene-centricity in the findings. We found considerable heterogeneity in information available from GWAS suggesting the wider community could benefit from standardization and centralization of results reporting.
format Text
id pubmed-2639349
institution National Center for Biotechnology Information
language English
publishDate 2009
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-26393492009-02-11 An Open Access Database of Genome-wide Association Results Johnson, Andrew D O'Donnell, Christopher J BMC Med Genet Research Article BACKGROUND: The number of genome-wide association studies (GWAS) is growing rapidly leading to the discovery and replication of many new disease loci. Combining results from multiple GWAS datasets may potentially strengthen previous conclusions and suggest new disease loci, pathways or pleiotropic genes. However, no database or centralized resource currently exists that contains anywhere near the full scope of GWAS results. METHODS: We collected available results from 118 GWAS articles into a database of 56,411 significant SNP-phenotype associations and accompanying information, making this database freely available here. In doing so, we met and describe here a number of challenges to creating an open access database of GWAS results. Through preliminary analyses and characterization of available GWAS, we demonstrate the potential to gain new insights by querying a database across GWAS. RESULTS: Using a genomic bin-based density analysis to search for highly associated regions of the genome, positive control loci (e.g., MHC loci) were detected with high sensitivity. Likewise, an analysis of highly repeated SNPs across GWAS identified replicated loci (e.g., APOE, LPL). At the same time we identified novel, highly suggestive loci for a variety of traits that did not meet genome-wide significant thresholds in prior analyses, in some cases with strong support from the primary medical genetics literature (SLC16A7, CSMD1, OAS1), suggesting these genes merit further study. Additional adjustment for linkage disequilibrium within most regions with a high density of GWAS associations did not materially alter our findings. Having a centralized database with standardized gene annotation also allowed us to examine the representation of functional gene categories (gene ontologies) containing one or more associations among top GWAS results. Genes relating to cell adhesion functions were highly over-represented among significant associations (p < 4.6 × 10(-14)), a finding which was not perturbed by a sensitivity analysis. CONCLUSION: We provide access to a full gene-annotated GWAS database which could be used for further querying, analyses or integration with other genomic information. We make a number of general observations. Of reported associated SNPs, 40% lie within the boundaries of a RefSeq gene and 68% are within 60 kb of one, indicating a bias toward gene-centricity in the findings. We found considerable heterogeneity in information available from GWAS suggesting the wider community could benefit from standardization and centralization of results reporting. BioMed Central 2009-01-22 /pmc/articles/PMC2639349/ /pubmed/19161620 http://dx.doi.org/10.1186/1471-2350-10-6 Text en Copyright © 2009 Johnson and O'Donnell; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Johnson, Andrew D
O'Donnell, Christopher J
An Open Access Database of Genome-wide Association Results
title An Open Access Database of Genome-wide Association Results
title_full An Open Access Database of Genome-wide Association Results
title_fullStr An Open Access Database of Genome-wide Association Results
title_full_unstemmed An Open Access Database of Genome-wide Association Results
title_short An Open Access Database of Genome-wide Association Results
title_sort open access database of genome-wide association results
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2639349/
https://www.ncbi.nlm.nih.gov/pubmed/19161620
http://dx.doi.org/10.1186/1471-2350-10-6
work_keys_str_mv AT johnsonandrewd anopenaccessdatabaseofgenomewideassociationresults
AT odonnellchristopherj anopenaccessdatabaseofgenomewideassociationresults
AT johnsonandrewd openaccessdatabaseofgenomewideassociationresults
AT odonnellchristopherj openaccessdatabaseofgenomewideassociationresults