Cargando…

Using Sequence-Specific Chemical and Structural Properties of DNA to Predict Transcription Factor Binding Sites

An important step in understanding gene regulation is to identify the DNA binding sites recognized by each transcription factor (TF). Conventional approaches to prediction of TF binding sites involve the definition of consensus sequences or position-specific weight matrices and rely on statistical a...

Descripción completa

Detalles Bibliográficos
Autores principales: Bauer, Amy L., Hlavacek, William S., Unkefer, Pat J., Mu, Fangping
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2010
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2987836/
https://www.ncbi.nlm.nih.gov/pubmed/21124945
http://dx.doi.org/10.1371/journal.pcbi.1001007
_version_ 1782192165338742784
author Bauer, Amy L.
Hlavacek, William S.
Unkefer, Pat J.
Mu, Fangping
author_facet Bauer, Amy L.
Hlavacek, William S.
Unkefer, Pat J.
Mu, Fangping
author_sort Bauer, Amy L.
collection PubMed
description An important step in understanding gene regulation is to identify the DNA binding sites recognized by each transcription factor (TF). Conventional approaches to prediction of TF binding sites involve the definition of consensus sequences or position-specific weight matrices and rely on statistical analysis of DNA sequences of known binding sites. Here, we present a method called SiteSleuth in which DNA structure prediction, computational chemistry, and machine learning are applied to develop models for TF binding sites. In this approach, binary classifiers are trained to discriminate between true and false binding sites based on the sequence-specific chemical and structural features of DNA. These features are determined via molecular dynamics calculations in which we consider each base in different local neighborhoods. For each of 54 TFs in Escherichia coli, for which at least five DNA binding sites are documented in RegulonDB, the TF binding sites and portions of the non-coding genome sequence are mapped to feature vectors and used in training. According to cross-validation analysis and a comparison of computational predictions against ChIP-chip data available for the TF Fis, SiteSleuth outperforms three conventional approaches: Match, MATRIX SEARCH, and the method of Berg and von Hippel. SiteSleuth also outperforms QPMEME, a method similar to SiteSleuth in that it involves a learning algorithm. The main advantage of SiteSleuth is a lower false positive rate.
format Text
id pubmed-2987836
institution National Center for Biotechnology Information
language English
publishDate 2010
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-29878362010-12-01 Using Sequence-Specific Chemical and Structural Properties of DNA to Predict Transcription Factor Binding Sites Bauer, Amy L. Hlavacek, William S. Unkefer, Pat J. Mu, Fangping PLoS Comput Biol Research Article An important step in understanding gene regulation is to identify the DNA binding sites recognized by each transcription factor (TF). Conventional approaches to prediction of TF binding sites involve the definition of consensus sequences or position-specific weight matrices and rely on statistical analysis of DNA sequences of known binding sites. Here, we present a method called SiteSleuth in which DNA structure prediction, computational chemistry, and machine learning are applied to develop models for TF binding sites. In this approach, binary classifiers are trained to discriminate between true and false binding sites based on the sequence-specific chemical and structural features of DNA. These features are determined via molecular dynamics calculations in which we consider each base in different local neighborhoods. For each of 54 TFs in Escherichia coli, for which at least five DNA binding sites are documented in RegulonDB, the TF binding sites and portions of the non-coding genome sequence are mapped to feature vectors and used in training. According to cross-validation analysis and a comparison of computational predictions against ChIP-chip data available for the TF Fis, SiteSleuth outperforms three conventional approaches: Match, MATRIX SEARCH, and the method of Berg and von Hippel. SiteSleuth also outperforms QPMEME, a method similar to SiteSleuth in that it involves a learning algorithm. The main advantage of SiteSleuth is a lower false positive rate. Public Library of Science 2010-11-18 /pmc/articles/PMC2987836/ /pubmed/21124945 http://dx.doi.org/10.1371/journal.pcbi.1001007 Text en Bauer et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Bauer, Amy L.
Hlavacek, William S.
Unkefer, Pat J.
Mu, Fangping
Using Sequence-Specific Chemical and Structural Properties of DNA to Predict Transcription Factor Binding Sites
title Using Sequence-Specific Chemical and Structural Properties of DNA to Predict Transcription Factor Binding Sites
title_full Using Sequence-Specific Chemical and Structural Properties of DNA to Predict Transcription Factor Binding Sites
title_fullStr Using Sequence-Specific Chemical and Structural Properties of DNA to Predict Transcription Factor Binding Sites
title_full_unstemmed Using Sequence-Specific Chemical and Structural Properties of DNA to Predict Transcription Factor Binding Sites
title_short Using Sequence-Specific Chemical and Structural Properties of DNA to Predict Transcription Factor Binding Sites
title_sort using sequence-specific chemical and structural properties of dna to predict transcription factor binding sites
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2987836/
https://www.ncbi.nlm.nih.gov/pubmed/21124945
http://dx.doi.org/10.1371/journal.pcbi.1001007
work_keys_str_mv AT baueramyl usingsequencespecificchemicalandstructuralpropertiesofdnatopredicttranscriptionfactorbindingsites
AT hlavacekwilliams usingsequencespecificchemicalandstructuralpropertiesofdnatopredicttranscriptionfactorbindingsites
AT unkeferpatj usingsequencespecificchemicalandstructuralpropertiesofdnatopredicttranscriptionfactorbindingsites
AT mufangping usingsequencespecificchemicalandstructuralpropertiesofdnatopredicttranscriptionfactorbindingsites