Cargando…

BayesPI-BAR: a new biophysical model for characterization of regulatory sequence variations

Sequence variations in regulatory DNA regions are known to cause functionally important consequences for gene expression. DNA sequence variations may have an essential role in determining phenotypes and may be linked to disease; however, their identification through analysis of massive genome-wide s...

Descripción completa

Detalles Bibliográficos
Autores principales: Wang, Junbai, Batmanov, Kirill
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4666384/
https://www.ncbi.nlm.nih.gov/pubmed/26202972
http://dx.doi.org/10.1093/nar/gkv733
_version_ 1782403698776866816
author Wang, Junbai
Batmanov, Kirill
author_facet Wang, Junbai
Batmanov, Kirill
author_sort Wang, Junbai
collection PubMed
description Sequence variations in regulatory DNA regions are known to cause functionally important consequences for gene expression. DNA sequence variations may have an essential role in determining phenotypes and may be linked to disease; however, their identification through analysis of massive genome-wide sequencing data is a great challenge. In this work, a new computational pipeline, a Bayesian method for protein–DNA interaction with binding affinity ranking (BayesPI-BAR), is proposed for quantifying the effect of sequence variations on protein binding. BayesPI-BAR uses biophysical modeling of protein–DNA interactions to predict single nucleotide polymorphisms (SNPs) that cause significant changes in the binding affinity of a regulatory region for transcription factors (TFs). The method includes two new parameters (TF chemical potentials or protein concentrations and direct TF binding targets) that are neglected by previous methods. The new method is verified on 67 known human regulatory SNPs, of which 47 (70%) have predicted true TFs ranked in the top 10. Importantly, the performance of BayesPI-BAR, which uses principal component analysis to integrate multiple predictions from various TF chemical potentials, is found to be better than that of existing programs, such as sTRAP and is-rSNP, when evaluated on the same SNPs. BayesPI-BAR is a publicly available tool and is able to carry out parallelized computation, which helps to investigate a large number of TFs or SNPs and to detect disease-associated regulatory sequence variations in the sea of genome-wide noncoding regions.
format Online
Article
Text
id pubmed-4666384
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-46663842015-12-02 BayesPI-BAR: a new biophysical model for characterization of regulatory sequence variations Wang, Junbai Batmanov, Kirill Nucleic Acids Res Methods Online Sequence variations in regulatory DNA regions are known to cause functionally important consequences for gene expression. DNA sequence variations may have an essential role in determining phenotypes and may be linked to disease; however, their identification through analysis of massive genome-wide sequencing data is a great challenge. In this work, a new computational pipeline, a Bayesian method for protein–DNA interaction with binding affinity ranking (BayesPI-BAR), is proposed for quantifying the effect of sequence variations on protein binding. BayesPI-BAR uses biophysical modeling of protein–DNA interactions to predict single nucleotide polymorphisms (SNPs) that cause significant changes in the binding affinity of a regulatory region for transcription factors (TFs). The method includes two new parameters (TF chemical potentials or protein concentrations and direct TF binding targets) that are neglected by previous methods. The new method is verified on 67 known human regulatory SNPs, of which 47 (70%) have predicted true TFs ranked in the top 10. Importantly, the performance of BayesPI-BAR, which uses principal component analysis to integrate multiple predictions from various TF chemical potentials, is found to be better than that of existing programs, such as sTRAP and is-rSNP, when evaluated on the same SNPs. BayesPI-BAR is a publicly available tool and is able to carry out parallelized computation, which helps to investigate a large number of TFs or SNPs and to detect disease-associated regulatory sequence variations in the sea of genome-wide noncoding regions. Oxford University Press 2015-12-02 2015-07-21 /pmc/articles/PMC4666384/ /pubmed/26202972 http://dx.doi.org/10.1093/nar/gkv733 Text en © The Author(s) 2015. Published by Oxford University Press on behalf of Nucleic Acids Research. http://creativecommons.org/licenses/by/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Online
Wang, Junbai
Batmanov, Kirill
BayesPI-BAR: a new biophysical model for characterization of regulatory sequence variations
title BayesPI-BAR: a new biophysical model for characterization of regulatory sequence variations
title_full BayesPI-BAR: a new biophysical model for characterization of regulatory sequence variations
title_fullStr BayesPI-BAR: a new biophysical model for characterization of regulatory sequence variations
title_full_unstemmed BayesPI-BAR: a new biophysical model for characterization of regulatory sequence variations
title_short BayesPI-BAR: a new biophysical model for characterization of regulatory sequence variations
title_sort bayespi-bar: a new biophysical model for characterization of regulatory sequence variations
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4666384/
https://www.ncbi.nlm.nih.gov/pubmed/26202972
http://dx.doi.org/10.1093/nar/gkv733
work_keys_str_mv AT wangjunbai bayespibaranewbiophysicalmodelforcharacterizationofregulatorysequencevariations
AT batmanovkirill bayespibaranewbiophysicalmodelforcharacterizationofregulatorysequencevariations