Cargando…

On the identification of potential regulatory variants within genome wide association candidate SNP sets

BACKGROUND: Genome wide association studies (GWAS) are a population-scale approach to the identification of segments of the genome in which genetic variations may contribute to disease risk. Current methods focus on the discovery of single nucleotide polymorphisms (SNPs) associated with disease trai...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Chih-yu, Chang, I-Shou, Hsiung, Chao A, Wasserman, Wyeth W
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4066296/
https://www.ncbi.nlm.nih.gov/pubmed/24920305
http://dx.doi.org/10.1186/1755-8794-7-34
_version_ 1782322164393836544
author Chen, Chih-yu
Chang, I-Shou
Hsiung, Chao A
Wasserman, Wyeth W
author_facet Chen, Chih-yu
Chang, I-Shou
Hsiung, Chao A
Wasserman, Wyeth W
author_sort Chen, Chih-yu
collection PubMed
description BACKGROUND: Genome wide association studies (GWAS) are a population-scale approach to the identification of segments of the genome in which genetic variations may contribute to disease risk. Current methods focus on the discovery of single nucleotide polymorphisms (SNPs) associated with disease traits. As there are many SNPs within identified risk loci, and the majority of these are situated within non-coding regions, a key challenge is to identify and prioritize variants affecting regulatory sequences that are likely to contribute to the phenotype assessed. METHODS: We focused investigation on SNPs within lung and breast cancer GWAS loci that reached genome-wide significance for potential roles in gene regulation with a specific focus on SNPs likely to disrupt transcription factor binding sites. Within risk loci, the regulatory potential of sub-regions was classified using relevant open chromatin and epigenetic high throughput sequencing data sets from the ENCODE project in available cancer and normal cell lines. Furthermore, transcription factor affinity altering variants were predicted by comparison of position weight matrix scores between disease and reference alleles. Lastly, ChIP-seq data of transcription associated factors and topological domains were included as binding evidence and potential gene target inference. RESULTS: The sets of SNPs, including both the disease-associated markers and those in high linkage disequilibrium with them, were significantly over-represented in regulatory sequences of cancer and/or normal cells; however, over-representation was generally not restricted to disease-relevant tissue specific regions. The calculated regulatory potential, allelic binding affinity scores and ChIP-seq binding evidence were the three criteria used to prioritize candidates. Fitting all three criteria, we highlighted breast cancer susceptibility SNPs and a borderline lung cancer relevant SNP located in cancer-specific enhancers overlapping multiple distinct transcription associated factor ChIP-seq binding sites. CONCLUSION: Incorporating high throughput sequencing epigenetic and transcription factor data sets from both cancer and normal cells into cancer genetic studies reveals potential functional SNPs and informs subsequent characterization efforts.
format Online
Article
Text
id pubmed-4066296
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-40662962014-06-24 On the identification of potential regulatory variants within genome wide association candidate SNP sets Chen, Chih-yu Chang, I-Shou Hsiung, Chao A Wasserman, Wyeth W BMC Med Genomics Research Article BACKGROUND: Genome wide association studies (GWAS) are a population-scale approach to the identification of segments of the genome in which genetic variations may contribute to disease risk. Current methods focus on the discovery of single nucleotide polymorphisms (SNPs) associated with disease traits. As there are many SNPs within identified risk loci, and the majority of these are situated within non-coding regions, a key challenge is to identify and prioritize variants affecting regulatory sequences that are likely to contribute to the phenotype assessed. METHODS: We focused investigation on SNPs within lung and breast cancer GWAS loci that reached genome-wide significance for potential roles in gene regulation with a specific focus on SNPs likely to disrupt transcription factor binding sites. Within risk loci, the regulatory potential of sub-regions was classified using relevant open chromatin and epigenetic high throughput sequencing data sets from the ENCODE project in available cancer and normal cell lines. Furthermore, transcription factor affinity altering variants were predicted by comparison of position weight matrix scores between disease and reference alleles. Lastly, ChIP-seq data of transcription associated factors and topological domains were included as binding evidence and potential gene target inference. RESULTS: The sets of SNPs, including both the disease-associated markers and those in high linkage disequilibrium with them, were significantly over-represented in regulatory sequences of cancer and/or normal cells; however, over-representation was generally not restricted to disease-relevant tissue specific regions. The calculated regulatory potential, allelic binding affinity scores and ChIP-seq binding evidence were the three criteria used to prioritize candidates. Fitting all three criteria, we highlighted breast cancer susceptibility SNPs and a borderline lung cancer relevant SNP located in cancer-specific enhancers overlapping multiple distinct transcription associated factor ChIP-seq binding sites. CONCLUSION: Incorporating high throughput sequencing epigenetic and transcription factor data sets from both cancer and normal cells into cancer genetic studies reveals potential functional SNPs and informs subsequent characterization efforts. BioMed Central 2014-06-11 /pmc/articles/PMC4066296/ /pubmed/24920305 http://dx.doi.org/10.1186/1755-8794-7-34 Text en Copyright © 2014 Chen et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Chen, Chih-yu
Chang, I-Shou
Hsiung, Chao A
Wasserman, Wyeth W
On the identification of potential regulatory variants within genome wide association candidate SNP sets
title On the identification of potential regulatory variants within genome wide association candidate SNP sets
title_full On the identification of potential regulatory variants within genome wide association candidate SNP sets
title_fullStr On the identification of potential regulatory variants within genome wide association candidate SNP sets
title_full_unstemmed On the identification of potential regulatory variants within genome wide association candidate SNP sets
title_short On the identification of potential regulatory variants within genome wide association candidate SNP sets
title_sort on the identification of potential regulatory variants within genome wide association candidate snp sets
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4066296/
https://www.ncbi.nlm.nih.gov/pubmed/24920305
http://dx.doi.org/10.1186/1755-8794-7-34
work_keys_str_mv AT chenchihyu ontheidentificationofpotentialregulatoryvariantswithingenomewideassociationcandidatesnpsets
AT changishou ontheidentificationofpotentialregulatoryvariantswithingenomewideassociationcandidatesnpsets
AT hsiungchaoa ontheidentificationofpotentialregulatoryvariantswithingenomewideassociationcandidatesnpsets
AT wassermanwyethw ontheidentificationofpotentialregulatoryvariantswithingenomewideassociationcandidatesnpsets