Cargando…

A flexible integrative approach based on random forest improves prediction of transcription factor binding sites

Transcription factor binding sites (TFBSs) are DNA sequences of 6–15 base pairs. Interaction of these TFBSs with transcription factors (TFs) is largely responsible for most spatiotemporal gene expression patterns. Here, we evaluate to what extent sequence-based prediction of TFBSs can be improved by...

Descripción completa

Detalles Bibliográficos
Autores principales: Hooghe, Bart, Broos, Stefan, van Roy, Frans, De Bleser, Pieter
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3413102/
https://www.ncbi.nlm.nih.gov/pubmed/22492513
http://dx.doi.org/10.1093/nar/gks283
_version_ 1782240025121914880
author Hooghe, Bart
Broos, Stefan
van Roy, Frans
De Bleser, Pieter
author_facet Hooghe, Bart
Broos, Stefan
van Roy, Frans
De Bleser, Pieter
author_sort Hooghe, Bart
collection PubMed
description Transcription factor binding sites (TFBSs) are DNA sequences of 6–15 base pairs. Interaction of these TFBSs with transcription factors (TFs) is largely responsible for most spatiotemporal gene expression patterns. Here, we evaluate to what extent sequence-based prediction of TFBSs can be improved by taking into account the positional dependencies of nucleotides (NPDs) and the nucleotide sequence-dependent structure of DNA. We make use of the random forest algorithm to flexibly exploit both types of information. Results in this study show that both the structural method and the NPD method can be valuable for the prediction of TFBSs. Moreover, their predictive values seem to be complementary, even to the widely used position weight matrix (PWM) method. This led us to combine all three methods. Results obtained for five eukaryotic TFs with different DNA-binding domains show that our method improves classification accuracy for all five eukaryotic TFs compared with other approaches. Additionally, we contrast the results of seven smaller prokaryotic sets with high-quality data and show that with the use of high-quality data we can significantly improve prediction performance. Models developed in this study can be of great use for gaining insight into the mechanisms of TF binding.
format Online
Article
Text
id pubmed-3413102
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-34131022012-08-07 A flexible integrative approach based on random forest improves prediction of transcription factor binding sites Hooghe, Bart Broos, Stefan van Roy, Frans De Bleser, Pieter Nucleic Acids Res Methods Online Transcription factor binding sites (TFBSs) are DNA sequences of 6–15 base pairs. Interaction of these TFBSs with transcription factors (TFs) is largely responsible for most spatiotemporal gene expression patterns. Here, we evaluate to what extent sequence-based prediction of TFBSs can be improved by taking into account the positional dependencies of nucleotides (NPDs) and the nucleotide sequence-dependent structure of DNA. We make use of the random forest algorithm to flexibly exploit both types of information. Results in this study show that both the structural method and the NPD method can be valuable for the prediction of TFBSs. Moreover, their predictive values seem to be complementary, even to the widely used position weight matrix (PWM) method. This led us to combine all three methods. Results obtained for five eukaryotic TFs with different DNA-binding domains show that our method improves classification accuracy for all five eukaryotic TFs compared with other approaches. Additionally, we contrast the results of seven smaller prokaryotic sets with high-quality data and show that with the use of high-quality data we can significantly improve prediction performance. Models developed in this study can be of great use for gaining insight into the mechanisms of TF binding. Oxford University Press 2012-08 2012-04-05 /pmc/articles/PMC3413102/ /pubmed/22492513 http://dx.doi.org/10.1093/nar/gks283 Text en © The Author(s) 2012. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methods Online
Hooghe, Bart
Broos, Stefan
van Roy, Frans
De Bleser, Pieter
A flexible integrative approach based on random forest improves prediction of transcription factor binding sites
title A flexible integrative approach based on random forest improves prediction of transcription factor binding sites
title_full A flexible integrative approach based on random forest improves prediction of transcription factor binding sites
title_fullStr A flexible integrative approach based on random forest improves prediction of transcription factor binding sites
title_full_unstemmed A flexible integrative approach based on random forest improves prediction of transcription factor binding sites
title_short A flexible integrative approach based on random forest improves prediction of transcription factor binding sites
title_sort flexible integrative approach based on random forest improves prediction of transcription factor binding sites
topic Methods Online
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3413102/
https://www.ncbi.nlm.nih.gov/pubmed/22492513
http://dx.doi.org/10.1093/nar/gks283
work_keys_str_mv AT hooghebart aflexibleintegrativeapproachbasedonrandomforestimprovespredictionoftranscriptionfactorbindingsites
AT broosstefan aflexibleintegrativeapproachbasedonrandomforestimprovespredictionoftranscriptionfactorbindingsites
AT vanroyfrans aflexibleintegrativeapproachbasedonrandomforestimprovespredictionoftranscriptionfactorbindingsites
AT debleserpieter aflexibleintegrativeapproachbasedonrandomforestimprovespredictionoftranscriptionfactorbindingsites
AT hooghebart flexibleintegrativeapproachbasedonrandomforestimprovespredictionoftranscriptionfactorbindingsites
AT broosstefan flexibleintegrativeapproachbasedonrandomforestimprovespredictionoftranscriptionfactorbindingsites
AT vanroyfrans flexibleintegrativeapproachbasedonrandomforestimprovespredictionoftranscriptionfactorbindingsites
AT debleserpieter flexibleintegrativeapproachbasedonrandomforestimprovespredictionoftranscriptionfactorbindingsites