Cargando…
Effect of positional dependence and alignment strategy on modeling transcription factor binding sites
BACKGROUND: Many consensus-based and Position Weight Matrix-based methods for recognizing transcription factor binding sites (TFBS) are not well suited to the variability in the lengths of binding sites. Besides, many methods discard known binding sites while building the model. Moreover, the impact...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3465234/ https://www.ncbi.nlm.nih.gov/pubmed/22748199 http://dx.doi.org/10.1186/1756-0500-5-340 |
_version_ | 1782245534007820288 |
---|---|
author | Quader, Saad Huang, Chun-Hsi |
author_facet | Quader, Saad Huang, Chun-Hsi |
author_sort | Quader, Saad |
collection | PubMed |
description | BACKGROUND: Many consensus-based and Position Weight Matrix-based methods for recognizing transcription factor binding sites (TFBS) are not well suited to the variability in the lengths of binding sites. Besides, many methods discard known binding sites while building the model. Moreover, the impact of Information Content (IC) and the positional dependence of nucleotides within an aligned set of TFBS has not been well researched for modeling variable-length binding sites. In this paper, we propose ML-Consensus (Mixed-Length Consensus): a consensus model for variable-length TFBS which does not exclude any reported binding sites. METHODS: We consider Pairwise Score (PS) as a measure of positional dependence of nucleotides within an alignment of TFBS. We investigate how the prediction accuracy of ML-Consensus is affected by the incorporation of IC and PS with a particular binding site alignment strategy. We perform cross-validations for datasets of six species from the TRANSFAC public database, and analyze the results using ROC curves and the Wilcoxon matched-pair signed-ranks test. RESULTS: We observe that the incorporation of IC and PS in ML-Consensus results in statistically significant improvement in the prediction accuracy of the model. Moreover, the existence of a core region among the known binding sites (of any length) is witnessed by the pairwise coexistence of nucleotides within the core length. CONCLUSIONS: These observations suggest the possibility of an efficient multiple sequence alignment algorithm for aligning TFBS, accommodating known binding sites of any length, for optimal (or near-optimal) TFBS prediction. However, designing such an algorithm is a matter of further investigation. |
format | Online Article Text |
id | pubmed-3465234 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-34652342012-10-10 Effect of positional dependence and alignment strategy on modeling transcription factor binding sites Quader, Saad Huang, Chun-Hsi BMC Res Notes Research Article BACKGROUND: Many consensus-based and Position Weight Matrix-based methods for recognizing transcription factor binding sites (TFBS) are not well suited to the variability in the lengths of binding sites. Besides, many methods discard known binding sites while building the model. Moreover, the impact of Information Content (IC) and the positional dependence of nucleotides within an aligned set of TFBS has not been well researched for modeling variable-length binding sites. In this paper, we propose ML-Consensus (Mixed-Length Consensus): a consensus model for variable-length TFBS which does not exclude any reported binding sites. METHODS: We consider Pairwise Score (PS) as a measure of positional dependence of nucleotides within an alignment of TFBS. We investigate how the prediction accuracy of ML-Consensus is affected by the incorporation of IC and PS with a particular binding site alignment strategy. We perform cross-validations for datasets of six species from the TRANSFAC public database, and analyze the results using ROC curves and the Wilcoxon matched-pair signed-ranks test. RESULTS: We observe that the incorporation of IC and PS in ML-Consensus results in statistically significant improvement in the prediction accuracy of the model. Moreover, the existence of a core region among the known binding sites (of any length) is witnessed by the pairwise coexistence of nucleotides within the core length. CONCLUSIONS: These observations suggest the possibility of an efficient multiple sequence alignment algorithm for aligning TFBS, accommodating known binding sites of any length, for optimal (or near-optimal) TFBS prediction. However, designing such an algorithm is a matter of further investigation. BioMed Central 2012-07-02 /pmc/articles/PMC3465234/ /pubmed/22748199 http://dx.doi.org/10.1186/1756-0500-5-340 Text en Copyright ©2012 Quader and Huang; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Quader, Saad Huang, Chun-Hsi Effect of positional dependence and alignment strategy on modeling transcription factor binding sites |
title | Effect of positional dependence and alignment strategy on modeling transcription factor binding sites |
title_full | Effect of positional dependence and alignment strategy on modeling transcription factor binding sites |
title_fullStr | Effect of positional dependence and alignment strategy on modeling transcription factor binding sites |
title_full_unstemmed | Effect of positional dependence and alignment strategy on modeling transcription factor binding sites |
title_short | Effect of positional dependence and alignment strategy on modeling transcription factor binding sites |
title_sort | effect of positional dependence and alignment strategy on modeling transcription factor binding sites |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3465234/ https://www.ncbi.nlm.nih.gov/pubmed/22748199 http://dx.doi.org/10.1186/1756-0500-5-340 |
work_keys_str_mv | AT quadersaad effectofpositionaldependenceandalignmentstrategyonmodelingtranscriptionfactorbindingsites AT huangchunhsi effectofpositionaldependenceandalignmentstrategyonmodelingtranscriptionfactorbindingsites |