Cargando…

Improved predictions of transcription factor binding sites using physicochemical features of DNA

Typical approaches for predicting transcription factor binding sites (TFBSs) involve use of a position-specific weight matrix (PWM) to statistically characterize the sequences of the known sites. Recently, an alternative physicochemical approach, called SiteSleuth, was proposed. In this approach, a...

Descripción completa

Detalles Bibliográficos
Autores principales: Maienschein-Cline, Mark, Dinner, Aaron R., Hlavacek, William S., Mu, Fangping
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3526315/
https://www.ncbi.nlm.nih.gov/pubmed/22923524
http://dx.doi.org/10.1093/nar/gks771
_version_ 1782253542717784064
author Maienschein-Cline, Mark
Dinner, Aaron R.
Hlavacek, William S.
Mu, Fangping
author_facet Maienschein-Cline, Mark
Dinner, Aaron R.
Hlavacek, William S.
Mu, Fangping
author_sort Maienschein-Cline, Mark
collection PubMed
description Typical approaches for predicting transcription factor binding sites (TFBSs) involve use of a position-specific weight matrix (PWM) to statistically characterize the sequences of the known sites. Recently, an alternative physicochemical approach, called SiteSleuth, was proposed. In this approach, a linear support vector machine (SVM) classifier is trained to distinguish TFBSs from background sequences based on local chemical and structural features of DNA. SiteSleuth appears to generally perform better than PWM-based methods. Here, we improve the SiteSleuth approach by considering both new physicochemical features and algorithmic modifications. New features are derived from Gibbs energies of amino acid–DNA interactions and hydroxyl radical cleavage profiles of DNA. Algorithmic modifications consist of inclusion of a feature selection step, use of a nonlinear kernel in the SVM classifier, and use of a consensus-based post-processing step for predictions. We also considered SVM classification based on letter features alone to distinguish performance gains from use of SVM-based models versus use of physicochemical features. The accuracy of each of the variant methods considered was assessed by cross validation using data available in the RegulonDB database for 54 Escherichia coli TFs, as well as by experimental validation using published ChIP-chip data available for Fis and Lrp.
format Online
Article
Text
id pubmed-3526315
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-35263152013-01-04 Improved predictions of transcription factor binding sites using physicochemical features of DNA Maienschein-Cline, Mark Dinner, Aaron R. Hlavacek, William S. Mu, Fangping Nucleic Acids Res Methods Online Typical approaches for predicting transcription factor binding sites (TFBSs) involve use of a position-specific weight matrix (PWM) to statistically characterize the sequences of the known sites. Recently, an alternative physicochemical approach, called SiteSleuth, was proposed. In this approach, a linear support vector machine (SVM) classifier is trained to distinguish TFBSs from background sequences based on local chemical and structural features of DNA. SiteSleuth appears to generally perform better than PWM-based methods. Here, we improve the SiteSleuth approach by considering both new physicochemical features and algorithmic modifications. New features are derived from Gibbs energies of amino acid–DNA interactions and hydroxyl radical cleavage profiles of DNA. Algorithmic modifications consist of inclusion of a feature selection step, use of a nonlinear kernel in the SVM classifier, and use of a consensus-based post-processing step for predictions. We also considered SVM classification based on letter features alone to distinguish performance gains from use of SVM-based models versus use of physicochemical features. The accuracy of each of the variant methods considered was assessed by cross validation using data available in the RegulonDB database for 54 Escherichia coli TFs, as well as by experimental validation using published ChIP-chip data available for Fis and Lrp. Oxford University Press 2012-12 2012-08-24 /pmc/articles/PMC3526315/ /pubmed/22923524 http://dx.doi.org/10.1093/nar/gks771 Text en © The Author(s) 2012. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Online
Maienschein-Cline, Mark
Dinner, Aaron R.
Hlavacek, William S.
Mu, Fangping
Improved predictions of transcription factor binding sites using physicochemical features of DNA
title Improved predictions of transcription factor binding sites using physicochemical features of DNA
title_full Improved predictions of transcription factor binding sites using physicochemical features of DNA
title_fullStr Improved predictions of transcription factor binding sites using physicochemical features of DNA
title_full_unstemmed Improved predictions of transcription factor binding sites using physicochemical features of DNA
title_short Improved predictions of transcription factor binding sites using physicochemical features of DNA
title_sort improved predictions of transcription factor binding sites using physicochemical features of dna
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3526315/
https://www.ncbi.nlm.nih.gov/pubmed/22923524
http://dx.doi.org/10.1093/nar/gks771
work_keys_str_mv AT maienscheinclinemark improvedpredictionsoftranscriptionfactorbindingsitesusingphysicochemicalfeaturesofdna
AT dinneraaronr improvedpredictionsoftranscriptionfactorbindingsitesusingphysicochemicalfeaturesofdna
AT hlavacekwilliams improvedpredictionsoftranscriptionfactorbindingsitesusingphysicochemicalfeaturesofdna
AT mufangping improvedpredictionsoftranscriptionfactorbindingsitesusingphysicochemicalfeaturesofdna