Cargando…

Using Sequence-Specific Chemical and Structural Properties of DNA to Predict Transcription Factor Binding Sites

An important step in understanding gene regulation is to identify the DNA binding sites recognized by each transcription factor (TF). Conventional approaches to prediction of TF binding sites involve the definition of consensus sequences or position-specific weight matrices and rely on statistical a...

Descripción completa

Detalles Bibliográficos
Autores principales:	Bauer, Amy L., Hlavacek, William S., Unkefer, Pat J., Mu, Fangping
Formato:	Texto
Lenguaje:	English
Publicado:	Public Library of Science 2010
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2987836/ https://www.ncbi.nlm.nih.gov/pubmed/21124945 http://dx.doi.org/10.1371/journal.pcbi.1001007

_version_	1782192165338742784
author	Bauer, Amy L. Hlavacek, William S. Unkefer, Pat J. Mu, Fangping
author_facet	Bauer, Amy L. Hlavacek, William S. Unkefer, Pat J. Mu, Fangping
author_sort	Bauer, Amy L.
collection	PubMed
description	An important step in understanding gene regulation is to identify the DNA binding sites recognized by each transcription factor (TF). Conventional approaches to prediction of TF binding sites involve the definition of consensus sequences or position-specific weight matrices and rely on statistical analysis of DNA sequences of known binding sites. Here, we present a method called SiteSleuth in which DNA structure prediction, computational chemistry, and machine learning are applied to develop models for TF binding sites. In this approach, binary classifiers are trained to discriminate between true and false binding sites based on the sequence-specific chemical and structural features of DNA. These features are determined via molecular dynamics calculations in which we consider each base in different local neighborhoods. For each of 54 TFs in Escherichia coli, for which at least five DNA binding sites are documented in RegulonDB, the TF binding sites and portions of the non-coding genome sequence are mapped to feature vectors and used in training. According to cross-validation analysis and a comparison of computational predictions against ChIP-chip data available for the TF Fis, SiteSleuth outperforms three conventional approaches: Match, MATRIX SEARCH, and the method of Berg and von Hippel. SiteSleuth also outperforms QPMEME, a method similar to SiteSleuth in that it involves a learning algorithm. The main advantage of SiteSleuth is a lower false positive rate.
format	Text
id	pubmed-2987836
institution	National Center for Biotechnology Information
language	English
publishDate	2010
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-29878362010-12-01 Using Sequence-Specific Chemical and Structural Properties of DNA to Predict Transcription Factor Binding Sites Bauer, Amy L. Hlavacek, William S. Unkefer, Pat J. Mu, Fangping PLoS Comput Biol Research Article An important step in understanding gene regulation is to identify the DNA binding sites recognized by each transcription factor (TF). Conventional approaches to prediction of TF binding sites involve the definition of consensus sequences or position-specific weight matrices and rely on statistical analysis of DNA sequences of known binding sites. Here, we present a method called SiteSleuth in which DNA structure prediction, computational chemistry, and machine learning are applied to develop models for TF binding sites. In this approach, binary classifiers are trained to discriminate between true and false binding sites based on the sequence-specific chemical and structural features of DNA. These features are determined via molecular dynamics calculations in which we consider each base in different local neighborhoods. For each of 54 TFs in Escherichia coli, for which at least five DNA binding sites are documented in RegulonDB, the TF binding sites and portions of the non-coding genome sequence are mapped to feature vectors and used in training. According to cross-validation analysis and a comparison of computational predictions against ChIP-chip data available for the TF Fis, SiteSleuth outperforms three conventional approaches: Match, MATRIX SEARCH, and the method of Berg and von Hippel. SiteSleuth also outperforms QPMEME, a method similar to SiteSleuth in that it involves a learning algorithm. The main advantage of SiteSleuth is a lower false positive rate. Public Library of Science 2010-11-18 /pmc/articles/PMC2987836/ /pubmed/21124945 http://dx.doi.org/10.1371/journal.pcbi.1001007 Text en Bauer et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle	Research Article Bauer, Amy L. Hlavacek, William S. Unkefer, Pat J. Mu, Fangping Using Sequence-Specific Chemical and Structural Properties of DNA to Predict Transcription Factor Binding Sites
title	Using Sequence-Specific Chemical and Structural Properties of DNA to Predict Transcription Factor Binding Sites
title_full	Using Sequence-Specific Chemical and Structural Properties of DNA to Predict Transcription Factor Binding Sites
title_fullStr	Using Sequence-Specific Chemical and Structural Properties of DNA to Predict Transcription Factor Binding Sites
title_full_unstemmed	Using Sequence-Specific Chemical and Structural Properties of DNA to Predict Transcription Factor Binding Sites
title_short	Using Sequence-Specific Chemical and Structural Properties of DNA to Predict Transcription Factor Binding Sites
title_sort	using sequence-specific chemical and structural properties of dna to predict transcription factor binding sites
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2987836/ https://www.ncbi.nlm.nih.gov/pubmed/21124945 http://dx.doi.org/10.1371/journal.pcbi.1001007
work_keys_str_mv	AT baueramyl usingsequencespecificchemicalandstructuralpropertiesofdnatopredicttranscriptionfactorbindingsites AT hlavacekwilliams usingsequencespecificchemicalandstructuralpropertiesofdnatopredicttranscriptionfactorbindingsites AT unkeferpatj usingsequencespecificchemicalandstructuralpropertiesofdnatopredicttranscriptionfactorbindingsites AT mufangping usingsequencespecificchemicalandstructuralpropertiesofdnatopredicttranscriptionfactorbindingsites

Using Sequence-Specific Chemical and Structural Properties of DNA to Predict Transcription Factor Binding Sites

Ejemplares similares