Cargando…

Extracting sequence features to predict protein–DNA interactions: a comparative study

Predicting how and where proteins, especially transcription factors (TFs), interact with DNA is an important problem in biology. We present here a systematic study of predictive modeling approaches to the TF–DNA binding problem, which have been frequently shown to be more efficient than those method...

Descripción completa

Detalles Bibliográficos
Autores principales:	Zhou, Qing, Liu, Jun S.
Formato:	Texto
Lenguaje:	English
Publicado:	Oxford University Press 2008
Materias:	Computational Biology
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2475627/ https://www.ncbi.nlm.nih.gov/pubmed/18556756 http://dx.doi.org/10.1093/nar/gkn361

_version_	1782157565225861120
author	Zhou, Qing Liu, Jun S.
author_facet	Zhou, Qing Liu, Jun S.
author_sort	Zhou, Qing
collection	PubMed
description	Predicting how and where proteins, especially transcription factors (TFs), interact with DNA is an important problem in biology. We present here a systematic study of predictive modeling approaches to the TF–DNA binding problem, which have been frequently shown to be more efficient than those methods only based on position-specific weight matrices (PWMs). In these approaches, a statistical relationship between genomic sequences and gene expression or ChIP-binding intensities is inferred through a regression framework; and influential sequence features are identified by variable selection. We examine a few state-of-the-art learning methods including stepwise linear regression, multivariate adaptive regression splines, neural networks, support vector machines, boosting and Bayesian additive regression trees (BART). These methods are applied to both simulated datasets and two whole-genome ChIP-chip datasets on the TFs Oct4 and Sox2, respectively, in human embryonic stem cells. We find that, with proper learning methods, predictive modeling approaches can significantly improve the predictive power and identify more biologically interesting features, such as TF–TF interactions, than the PWM approach. In particular, BART and boosting show the best and the most robust overall performance among all the methods.
format	Text
id	pubmed-2475627
institution	National Center for Biotechnology Information
language	English
publishDate	2008
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-24756272008-07-21 Extracting sequence features to predict protein–DNA interactions: a comparative study Zhou, Qing Liu, Jun S. Nucleic Acids Res Computational Biology Predicting how and where proteins, especially transcription factors (TFs), interact with DNA is an important problem in biology. We present here a systematic study of predictive modeling approaches to the TF–DNA binding problem, which have been frequently shown to be more efficient than those methods only based on position-specific weight matrices (PWMs). In these approaches, a statistical relationship between genomic sequences and gene expression or ChIP-binding intensities is inferred through a regression framework; and influential sequence features are identified by variable selection. We examine a few state-of-the-art learning methods including stepwise linear regression, multivariate adaptive regression splines, neural networks, support vector machines, boosting and Bayesian additive regression trees (BART). These methods are applied to both simulated datasets and two whole-genome ChIP-chip datasets on the TFs Oct4 and Sox2, respectively, in human embryonic stem cells. We find that, with proper learning methods, predictive modeling approaches can significantly improve the predictive power and identify more biologically interesting features, such as TF–TF interactions, than the PWM approach. In particular, BART and boosting show the best and the most robust overall performance among all the methods. Oxford University Press 2008-07 2008-06-13 /pmc/articles/PMC2475627/ /pubmed/18556756 http://dx.doi.org/10.1093/nar/gkn361 Text en © 2008 The Author(s) http://creativecommons.org/licenses/by-nc/2.0/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Computational Biology Zhou, Qing Liu, Jun S. Extracting sequence features to predict protein–DNA interactions: a comparative study
title	Extracting sequence features to predict protein–DNA interactions: a comparative study
title_full	Extracting sequence features to predict protein–DNA interactions: a comparative study
title_fullStr	Extracting sequence features to predict protein–DNA interactions: a comparative study
title_full_unstemmed	Extracting sequence features to predict protein–DNA interactions: a comparative study
title_short	Extracting sequence features to predict protein–DNA interactions: a comparative study
title_sort	extracting sequence features to predict protein–dna interactions: a comparative study
topic	Computational Biology
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2475627/ https://www.ncbi.nlm.nih.gov/pubmed/18556756 http://dx.doi.org/10.1093/nar/gkn361
work_keys_str_mv	AT zhouqing extractingsequencefeaturestopredictproteindnainteractionsacomparativestudy AT liujuns extractingsequencefeaturestopredictproteindnainteractionsacomparativestudy

Extracting sequence features to predict protein–DNA interactions: a comparative study

Ejemplares similares