Cargando…

Effect of positional dependence and alignment strategy on modeling transcription factor binding sites

BACKGROUND: Many consensus-based and Position Weight Matrix-based methods for recognizing transcription factor binding sites (TFBS) are not well suited to the variability in the lengths of binding sites. Besides, many methods discard known binding sites while building the model. Moreover, the impact...

Descripción completa

Detalles Bibliográficos
Autores principales: Quader, Saad, Huang, Chun-Hsi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2012
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3465234/
https://www.ncbi.nlm.nih.gov/pubmed/22748199
http://dx.doi.org/10.1186/1756-0500-5-340
_version_ 1782245534007820288
author Quader, Saad
Huang, Chun-Hsi
author_facet Quader, Saad
Huang, Chun-Hsi
author_sort Quader, Saad
collection PubMed
description BACKGROUND: Many consensus-based and Position Weight Matrix-based methods for recognizing transcription factor binding sites (TFBS) are not well suited to the variability in the lengths of binding sites. Besides, many methods discard known binding sites while building the model. Moreover, the impact of Information Content (IC) and the positional dependence of nucleotides within an aligned set of TFBS has not been well researched for modeling variable-length binding sites. In this paper, we propose ML-Consensus (Mixed-Length Consensus): a consensus model for variable-length TFBS which does not exclude any reported binding sites. METHODS: We consider Pairwise Score (PS) as a measure of positional dependence of nucleotides within an alignment of TFBS. We investigate how the prediction accuracy of ML-Consensus is affected by the incorporation of IC and PS with a particular binding site alignment strategy. We perform cross-validations for datasets of six species from the TRANSFAC public database, and analyze the results using ROC curves and the Wilcoxon matched-pair signed-ranks test. RESULTS: We observe that the incorporation of IC and PS in ML-Consensus results in statistically significant improvement in the prediction accuracy of the model. Moreover, the existence of a core region among the known binding sites (of any length) is witnessed by the pairwise coexistence of nucleotides within the core length. CONCLUSIONS: These observations suggest the possibility of an efficient multiple sequence alignment algorithm for aligning TFBS, accommodating known binding sites of any length, for optimal (or near-optimal) TFBS prediction. However, designing such an algorithm is a matter of further investigation.
format Online
Article
Text
id pubmed-3465234
institution National Center for Biotechnology Information
language English
publishDate 2012
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-34652342012-10-10 Effect of positional dependence and alignment strategy on modeling transcription factor binding sites Quader, Saad Huang, Chun-Hsi BMC Res Notes Research Article BACKGROUND: Many consensus-based and Position Weight Matrix-based methods for recognizing transcription factor binding sites (TFBS) are not well suited to the variability in the lengths of binding sites. Besides, many methods discard known binding sites while building the model. Moreover, the impact of Information Content (IC) and the positional dependence of nucleotides within an aligned set of TFBS has not been well researched for modeling variable-length binding sites. In this paper, we propose ML-Consensus (Mixed-Length Consensus): a consensus model for variable-length TFBS which does not exclude any reported binding sites. METHODS: We consider Pairwise Score (PS) as a measure of positional dependence of nucleotides within an alignment of TFBS. We investigate how the prediction accuracy of ML-Consensus is affected by the incorporation of IC and PS with a particular binding site alignment strategy. We perform cross-validations for datasets of six species from the TRANSFAC public database, and analyze the results using ROC curves and the Wilcoxon matched-pair signed-ranks test. RESULTS: We observe that the incorporation of IC and PS in ML-Consensus results in statistically significant improvement in the prediction accuracy of the model. Moreover, the existence of a core region among the known binding sites (of any length) is witnessed by the pairwise coexistence of nucleotides within the core length. CONCLUSIONS: These observations suggest the possibility of an efficient multiple sequence alignment algorithm for aligning TFBS, accommodating known binding sites of any length, for optimal (or near-optimal) TFBS prediction. However, designing such an algorithm is a matter of further investigation. BioMed Central 2012-07-02 /pmc/articles/PMC3465234/ /pubmed/22748199 http://dx.doi.org/10.1186/1756-0500-5-340 Text en Copyright ©2012 Quader and Huang; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Quader, Saad
Huang, Chun-Hsi
Effect of positional dependence and alignment strategy on modeling transcription factor binding sites
title Effect of positional dependence and alignment strategy on modeling transcription factor binding sites
title_full Effect of positional dependence and alignment strategy on modeling transcription factor binding sites
title_fullStr Effect of positional dependence and alignment strategy on modeling transcription factor binding sites
title_full_unstemmed Effect of positional dependence and alignment strategy on modeling transcription factor binding sites
title_short Effect of positional dependence and alignment strategy on modeling transcription factor binding sites
title_sort effect of positional dependence and alignment strategy on modeling transcription factor binding sites
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3465234/
https://www.ncbi.nlm.nih.gov/pubmed/22748199
http://dx.doi.org/10.1186/1756-0500-5-340
work_keys_str_mv AT quadersaad effectofpositionaldependenceandalignmentstrategyonmodelingtranscriptionfactorbindingsites
AT huangchunhsi effectofpositionaldependenceandalignmentstrategyonmodelingtranscriptionfactorbindingsites