Cargando…

Establishing and validating regulatory regions for variant annotation and expression analysis

BACKGROUND: The regulatory effect of inherited or de novo genetic variants occurring in promoters as well as in transcribed or even coding gene regions is gaining greater recognition as a contributing factor to disease processes in addition to mutations affecting protein functionality. Thousands of...

Descripción completa

Detalles Bibliográficos
Autores principales: Kaplun, Alexander, Krull, Mathias, Lakshman, Karthick, Matys, Volker, Lewicki, Birgit, Hogan, Jennifer D.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4928138/
https://www.ncbi.nlm.nih.gov/pubmed/27357948
http://dx.doi.org/10.1186/s12864-016-2724-0
_version_ 1782440385964933120
author Kaplun, Alexander
Krull, Mathias
Lakshman, Karthick
Matys, Volker
Lewicki, Birgit
Hogan, Jennifer D.
author_facet Kaplun, Alexander
Krull, Mathias
Lakshman, Karthick
Matys, Volker
Lewicki, Birgit
Hogan, Jennifer D.
author_sort Kaplun, Alexander
collection PubMed
description BACKGROUND: The regulatory effect of inherited or de novo genetic variants occurring in promoters as well as in transcribed or even coding gene regions is gaining greater recognition as a contributing factor to disease processes in addition to mutations affecting protein functionality. Thousands of such regulatory mutations are already recorded in HGMD, OMIM, ClinVar and other databases containing published disease causing and associated mutations. It is therefore important to properly annotate genetic variants occurring in experimentally verified and predicted transcription factor binding sites (TFBS) that could thus influence the factor binding event. Selection of the promoter sequence used is an important factor in the analysis as it directly influences the composition of the sequence available for transcription factor binding analysis. RESULTS: In this study we first establish genomic regions likely to be involved in regulation of gene expression. TRANSFAC uses a method of virtual transcription start sites (vTSS) calculation to define the best supported promoter for a gene. We have performed a comparison of the virtually calculated promoters between the best supported and secondary promoters in hg19 and hg38 reference genomes to test and validate the approach. Next we create and utilize a workflow for systematic analysis of casual disease associated variants in TFBS using Genome Trax and TRANSFAC databases. A total of 841 and 736 experimentally verified TFBSs within best supported promoters were mapped over HGMD and ClinVar mutation sites respectively. Tens of thousands of predicted ChIP-Seq derived TFBSs were mapped over mutations as well. We have further analyzed some of these mutations for potential gain or loss in transcription factor binding. CONCLUSIONS: We have confirmed the validity of TRANSFAC’s approach to define the best supported promoters and established a workflow of their use in annotation of regulatory genetic variants.
format Online
Article
Text
id pubmed-4928138
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-49281382016-06-30 Establishing and validating regulatory regions for variant annotation and expression analysis Kaplun, Alexander Krull, Mathias Lakshman, Karthick Matys, Volker Lewicki, Birgit Hogan, Jennifer D. BMC Genomics Methodology Article BACKGROUND: The regulatory effect of inherited or de novo genetic variants occurring in promoters as well as in transcribed or even coding gene regions is gaining greater recognition as a contributing factor to disease processes in addition to mutations affecting protein functionality. Thousands of such regulatory mutations are already recorded in HGMD, OMIM, ClinVar and other databases containing published disease causing and associated mutations. It is therefore important to properly annotate genetic variants occurring in experimentally verified and predicted transcription factor binding sites (TFBS) that could thus influence the factor binding event. Selection of the promoter sequence used is an important factor in the analysis as it directly influences the composition of the sequence available for transcription factor binding analysis. RESULTS: In this study we first establish genomic regions likely to be involved in regulation of gene expression. TRANSFAC uses a method of virtual transcription start sites (vTSS) calculation to define the best supported promoter for a gene. We have performed a comparison of the virtually calculated promoters between the best supported and secondary promoters in hg19 and hg38 reference genomes to test and validate the approach. Next we create and utilize a workflow for systematic analysis of casual disease associated variants in TFBS using Genome Trax and TRANSFAC databases. A total of 841 and 736 experimentally verified TFBSs within best supported promoters were mapped over HGMD and ClinVar mutation sites respectively. Tens of thousands of predicted ChIP-Seq derived TFBSs were mapped over mutations as well. We have further analyzed some of these mutations for potential gain or loss in transcription factor binding. CONCLUSIONS: We have confirmed the validity of TRANSFAC’s approach to define the best supported promoters and established a workflow of their use in annotation of regulatory genetic variants. BioMed Central 2016-06-23 /pmc/articles/PMC4928138/ /pubmed/27357948 http://dx.doi.org/10.1186/s12864-016-2724-0 Text en © Kaplun et al. 2016 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Kaplun, Alexander
Krull, Mathias
Lakshman, Karthick
Matys, Volker
Lewicki, Birgit
Hogan, Jennifer D.
Establishing and validating regulatory regions for variant annotation and expression analysis
title Establishing and validating regulatory regions for variant annotation and expression analysis
title_full Establishing and validating regulatory regions for variant annotation and expression analysis
title_fullStr Establishing and validating regulatory regions for variant annotation and expression analysis
title_full_unstemmed Establishing and validating regulatory regions for variant annotation and expression analysis
title_short Establishing and validating regulatory regions for variant annotation and expression analysis
title_sort establishing and validating regulatory regions for variant annotation and expression analysis
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4928138/
https://www.ncbi.nlm.nih.gov/pubmed/27357948
http://dx.doi.org/10.1186/s12864-016-2724-0
work_keys_str_mv AT kaplunalexander establishingandvalidatingregulatoryregionsforvariantannotationandexpressionanalysis
AT krullmathias establishingandvalidatingregulatoryregionsforvariantannotationandexpressionanalysis
AT lakshmankarthick establishingandvalidatingregulatoryregionsforvariantannotationandexpressionanalysis
AT matysvolker establishingandvalidatingregulatoryregionsforvariantannotationandexpressionanalysis
AT lewickibirgit establishingandvalidatingregulatoryregionsforvariantannotationandexpressionanalysis
AT hoganjenniferd establishingandvalidatingregulatoryregionsforvariantannotationandexpressionanalysis