Cargando…
Using Sequence-Specific Chemical and Structural Properties of DNA to Predict Transcription Factor Binding Sites
An important step in understanding gene regulation is to identify the DNA binding sites recognized by each transcription factor (TF). Conventional approaches to prediction of TF binding sites involve the definition of consensus sequences or position-specific weight matrices and rely on statistical a...
Autores principales: | , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2010
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2987836/ https://www.ncbi.nlm.nih.gov/pubmed/21124945 http://dx.doi.org/10.1371/journal.pcbi.1001007 |
_version_ | 1782192165338742784 |
---|---|
author | Bauer, Amy L. Hlavacek, William S. Unkefer, Pat J. Mu, Fangping |
author_facet | Bauer, Amy L. Hlavacek, William S. Unkefer, Pat J. Mu, Fangping |
author_sort | Bauer, Amy L. |
collection | PubMed |
description | An important step in understanding gene regulation is to identify the DNA binding sites recognized by each transcription factor (TF). Conventional approaches to prediction of TF binding sites involve the definition of consensus sequences or position-specific weight matrices and rely on statistical analysis of DNA sequences of known binding sites. Here, we present a method called SiteSleuth in which DNA structure prediction, computational chemistry, and machine learning are applied to develop models for TF binding sites. In this approach, binary classifiers are trained to discriminate between true and false binding sites based on the sequence-specific chemical and structural features of DNA. These features are determined via molecular dynamics calculations in which we consider each base in different local neighborhoods. For each of 54 TFs in Escherichia coli, for which at least five DNA binding sites are documented in RegulonDB, the TF binding sites and portions of the non-coding genome sequence are mapped to feature vectors and used in training. According to cross-validation analysis and a comparison of computational predictions against ChIP-chip data available for the TF Fis, SiteSleuth outperforms three conventional approaches: Match, MATRIX SEARCH, and the method of Berg and von Hippel. SiteSleuth also outperforms QPMEME, a method similar to SiteSleuth in that it involves a learning algorithm. The main advantage of SiteSleuth is a lower false positive rate. |
format | Text |
id | pubmed-2987836 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2010 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-29878362010-12-01 Using Sequence-Specific Chemical and Structural Properties of DNA to Predict Transcription Factor Binding Sites Bauer, Amy L. Hlavacek, William S. Unkefer, Pat J. Mu, Fangping PLoS Comput Biol Research Article An important step in understanding gene regulation is to identify the DNA binding sites recognized by each transcription factor (TF). Conventional approaches to prediction of TF binding sites involve the definition of consensus sequences or position-specific weight matrices and rely on statistical analysis of DNA sequences of known binding sites. Here, we present a method called SiteSleuth in which DNA structure prediction, computational chemistry, and machine learning are applied to develop models for TF binding sites. In this approach, binary classifiers are trained to discriminate between true and false binding sites based on the sequence-specific chemical and structural features of DNA. These features are determined via molecular dynamics calculations in which we consider each base in different local neighborhoods. For each of 54 TFs in Escherichia coli, for which at least five DNA binding sites are documented in RegulonDB, the TF binding sites and portions of the non-coding genome sequence are mapped to feature vectors and used in training. According to cross-validation analysis and a comparison of computational predictions against ChIP-chip data available for the TF Fis, SiteSleuth outperforms three conventional approaches: Match, MATRIX SEARCH, and the method of Berg and von Hippel. SiteSleuth also outperforms QPMEME, a method similar to SiteSleuth in that it involves a learning algorithm. The main advantage of SiteSleuth is a lower false positive rate. Public Library of Science 2010-11-18 /pmc/articles/PMC2987836/ /pubmed/21124945 http://dx.doi.org/10.1371/journal.pcbi.1001007 Text en Bauer et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Bauer, Amy L. Hlavacek, William S. Unkefer, Pat J. Mu, Fangping Using Sequence-Specific Chemical and Structural Properties of DNA to Predict Transcription Factor Binding Sites |
title | Using Sequence-Specific Chemical and Structural Properties of DNA to Predict Transcription Factor Binding Sites |
title_full | Using Sequence-Specific Chemical and Structural Properties of DNA to Predict Transcription Factor Binding Sites |
title_fullStr | Using Sequence-Specific Chemical and Structural Properties of DNA to Predict Transcription Factor Binding Sites |
title_full_unstemmed | Using Sequence-Specific Chemical and Structural Properties of DNA to Predict Transcription Factor Binding Sites |
title_short | Using Sequence-Specific Chemical and Structural Properties of DNA to Predict Transcription Factor Binding Sites |
title_sort | using sequence-specific chemical and structural properties of dna to predict transcription factor binding sites |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2987836/ https://www.ncbi.nlm.nih.gov/pubmed/21124945 http://dx.doi.org/10.1371/journal.pcbi.1001007 |
work_keys_str_mv | AT baueramyl usingsequencespecificchemicalandstructuralpropertiesofdnatopredicttranscriptionfactorbindingsites AT hlavacekwilliams usingsequencespecificchemicalandstructuralpropertiesofdnatopredicttranscriptionfactorbindingsites AT unkeferpatj usingsequencespecificchemicalandstructuralpropertiesofdnatopredicttranscriptionfactorbindingsites AT mufangping usingsequencespecificchemicalandstructuralpropertiesofdnatopredicttranscriptionfactorbindingsites |