Cargando…

Modeling DNA affinity landscape through two-round support vector regression with weighted degree kernels

BACKGROUND: A quantitative understanding of interactions between transcription factors (TFs) and their DNA binding sites is key to the rational design of gene regulatory networks. Recent advances in high-throughput technologies have enabled high-resolution measurements of protein-DNA binding affinit...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wang, Xiaolei, Kuwahara, Hiroyuki, Gao, Xin
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2014
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4305984/ https://www.ncbi.nlm.nih.gov/pubmed/25605483 http://dx.doi.org/10.1186/1752-0509-8-S5-S5

_version_	1782354256491184128
author	Wang, Xiaolei Kuwahara, Hiroyuki Gao, Xin
author_facet	Wang, Xiaolei Kuwahara, Hiroyuki Gao, Xin
author_sort	Wang, Xiaolei
collection	PubMed
description	BACKGROUND: A quantitative understanding of interactions between transcription factors (TFs) and their DNA binding sites is key to the rational design of gene regulatory networks. Recent advances in high-throughput technologies have enabled high-resolution measurements of protein-DNA binding affinity. Importantly, such experiments revealed the complex nature of TF-DNA interactions, whereby the effects of nucleotide changes on the binding affinity were observed to be context dependent. A systematic method to give high-quality estimates of such complex affinity landscapes is, thus, essential to the control of gene expression and the advance of synthetic biology. RESULTS: Here, we propose a two-round prediction method that is based on support vector regression (SVR) with weighted degree (WD) kernels. In the first round, a WD kernel with shifts and mismatches is used with SVR to detect the importance of subsequences with different lengths at different positions. The subsequences identified as important in the first round are then fed into a second WD kernel to fit the experimentally measured affinities. To our knowledge, this is the first attempt to increase the accuracy of the affinity prediction by applying two rounds of string kernels and by identifying a small number of crucial k-mers. The proposed method was tested by predicting the binding affinity landscape of Gcn4p in Saccharomyces cerevisiae using datasets from HiTS-FLIP. Our method explicitly identified important subsequences and showed significant performance improvements when compared with other state-of-the-art methods. Based on the identified important subsequences, we discovered two surprisingly stable 10-mers and one sensitive 10-mer which were not reported before. Further test on four other TFs in S. cerevisiae demonstrated the generality of our method. CONCLUSION: We proposed in this paper a two-round method to quantitatively model the DNA binding affinity landscape. Since the ability to modify genetic parts to fine-tune gene expression rates is crucial to the design of biological systems, such a tool may play an important role in the success of synthetic biology going forward.
format	Online Article Text
id	pubmed-4305984
institution	National Center for Biotechnology Information
language	English
publishDate	2014
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-43059842015-02-12 Modeling DNA affinity landscape through two-round support vector regression with weighted degree kernels Wang, Xiaolei Kuwahara, Hiroyuki Gao, Xin BMC Syst Biol Research BACKGROUND: A quantitative understanding of interactions between transcription factors (TFs) and their DNA binding sites is key to the rational design of gene regulatory networks. Recent advances in high-throughput technologies have enabled high-resolution measurements of protein-DNA binding affinity. Importantly, such experiments revealed the complex nature of TF-DNA interactions, whereby the effects of nucleotide changes on the binding affinity were observed to be context dependent. A systematic method to give high-quality estimates of such complex affinity landscapes is, thus, essential to the control of gene expression and the advance of synthetic biology. RESULTS: Here, we propose a two-round prediction method that is based on support vector regression (SVR) with weighted degree (WD) kernels. In the first round, a WD kernel with shifts and mismatches is used with SVR to detect the importance of subsequences with different lengths at different positions. The subsequences identified as important in the first round are then fed into a second WD kernel to fit the experimentally measured affinities. To our knowledge, this is the first attempt to increase the accuracy of the affinity prediction by applying two rounds of string kernels and by identifying a small number of crucial k-mers. The proposed method was tested by predicting the binding affinity landscape of Gcn4p in Saccharomyces cerevisiae using datasets from HiTS-FLIP. Our method explicitly identified important subsequences and showed significant performance improvements when compared with other state-of-the-art methods. Based on the identified important subsequences, we discovered two surprisingly stable 10-mers and one sensitive 10-mer which were not reported before. Further test on four other TFs in S. cerevisiae demonstrated the generality of our method. CONCLUSION: We proposed in this paper a two-round method to quantitatively model the DNA binding affinity landscape. Since the ability to modify genetic parts to fine-tune gene expression rates is crucial to the design of biological systems, such a tool may play an important role in the success of synthetic biology going forward. BioMed Central 2014-12-12 /pmc/articles/PMC4305984/ /pubmed/25605483 http://dx.doi.org/10.1186/1752-0509-8-S5-S5 Text en Copyright © 2014 Wang et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/4.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle	Research Wang, Xiaolei Kuwahara, Hiroyuki Gao, Xin Modeling DNA affinity landscape through two-round support vector regression with weighted degree kernels
title	Modeling DNA affinity landscape through two-round support vector regression with weighted degree kernels
title_full	Modeling DNA affinity landscape through two-round support vector regression with weighted degree kernels
title_fullStr	Modeling DNA affinity landscape through two-round support vector regression with weighted degree kernels
title_full_unstemmed	Modeling DNA affinity landscape through two-round support vector regression with weighted degree kernels
title_short	Modeling DNA affinity landscape through two-round support vector regression with weighted degree kernels
title_sort	modeling dna affinity landscape through two-round support vector regression with weighted degree kernels
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4305984/ https://www.ncbi.nlm.nih.gov/pubmed/25605483 http://dx.doi.org/10.1186/1752-0509-8-S5-S5
work_keys_str_mv	AT wangxiaolei modelingdnaaffinitylandscapethroughtworoundsupportvectorregressionwithweighteddegreekernels AT kuwaharahiroyuki modelingdnaaffinitylandscapethroughtworoundsupportvectorregressionwithweighteddegreekernels AT gaoxin modelingdnaaffinitylandscapethroughtworoundsupportvectorregressionwithweighteddegreekernels

Modeling DNA affinity landscape through two-round support vector regression with weighted degree kernels

Ejemplares similares