Cargando…

Finding sequence motifs with Bayesian models incorporating positional information: an application to transcription factor binding sites

BACKGROUND: Biologically active sequence motifs often have positional preferences with respect to a genomic landmark. For example, many known transcription factor binding sites (TFBSs) occur within an interval [-300, 0] bases upstream of a transcription start site (TSS). Although some programs for i...

Descripción completa

Detalles Bibliográficos
Autores principales: Kim, Nak-Kyeong, Tharakaraman, Kannan, Mariño-Ramírez, Leonardo, Spouge, John L
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2432075/
https://www.ncbi.nlm.nih.gov/pubmed/18533028
http://dx.doi.org/10.1186/1471-2105-9-262
_version_ 1782156456760442880
author Kim, Nak-Kyeong
Tharakaraman, Kannan
Mariño-Ramírez, Leonardo
Spouge, John L
author_facet Kim, Nak-Kyeong
Tharakaraman, Kannan
Mariño-Ramírez, Leonardo
Spouge, John L
author_sort Kim, Nak-Kyeong
collection PubMed
description BACKGROUND: Biologically active sequence motifs often have positional preferences with respect to a genomic landmark. For example, many known transcription factor binding sites (TFBSs) occur within an interval [-300, 0] bases upstream of a transcription start site (TSS). Although some programs for identifying sequence motifs exploit positional information, most of them model it only implicitly and with ad hoc methods, making them unsuitable for general motif searches. RESULTS: A-GLAM, a user-friendly computer program for identifying sequence motifs, now incorporates a Bayesian model systematically combining sequence and positional information. A-GLAM's predictions with and without positional information were compared on two human TFBS datasets, each containing sequences corresponding to the interval [-2000, 0] bases upstream of a known TSS. A rigorous statistical analysis showed that positional information significantly improved the prediction of sequence motifs, and an extensive cross-validation study showed that A-GLAM's model was robust against mild misspecification of its parameters. As expected, when sequences in the datasets were successively truncated to the intervals [-1000, 0], [-500, 0] and [-250, 0], positional information aided motif prediction less and less, but never hurt it significantly. CONCLUSION: Although sequence truncation is a viable strategy when searching for biologically active motifs with a positional preference, a probabilistic model (used reasonably) generally provides a superior and more robust strategy, particularly when the sequence motifs' positional preferences are not well characterized.
format Text
id pubmed-2432075
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-24320752008-06-20 Finding sequence motifs with Bayesian models incorporating positional information: an application to transcription factor binding sites Kim, Nak-Kyeong Tharakaraman, Kannan Mariño-Ramírez, Leonardo Spouge, John L BMC Bioinformatics Research Article BACKGROUND: Biologically active sequence motifs often have positional preferences with respect to a genomic landmark. For example, many known transcription factor binding sites (TFBSs) occur within an interval [-300, 0] bases upstream of a transcription start site (TSS). Although some programs for identifying sequence motifs exploit positional information, most of them model it only implicitly and with ad hoc methods, making them unsuitable for general motif searches. RESULTS: A-GLAM, a user-friendly computer program for identifying sequence motifs, now incorporates a Bayesian model systematically combining sequence and positional information. A-GLAM's predictions with and without positional information were compared on two human TFBS datasets, each containing sequences corresponding to the interval [-2000, 0] bases upstream of a known TSS. A rigorous statistical analysis showed that positional information significantly improved the prediction of sequence motifs, and an extensive cross-validation study showed that A-GLAM's model was robust against mild misspecification of its parameters. As expected, when sequences in the datasets were successively truncated to the intervals [-1000, 0], [-500, 0] and [-250, 0], positional information aided motif prediction less and less, but never hurt it significantly. CONCLUSION: Although sequence truncation is a viable strategy when searching for biologically active motifs with a positional preference, a probabilistic model (used reasonably) generally provides a superior and more robust strategy, particularly when the sequence motifs' positional preferences are not well characterized. BioMed Central 2008-06-04 /pmc/articles/PMC2432075/ /pubmed/18533028 http://dx.doi.org/10.1186/1471-2105-9-262 Text en Copyright © 2008 Kim et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Kim, Nak-Kyeong
Tharakaraman, Kannan
Mariño-Ramírez, Leonardo
Spouge, John L
Finding sequence motifs with Bayesian models incorporating positional information: an application to transcription factor binding sites
title Finding sequence motifs with Bayesian models incorporating positional information: an application to transcription factor binding sites
title_full Finding sequence motifs with Bayesian models incorporating positional information: an application to transcription factor binding sites
title_fullStr Finding sequence motifs with Bayesian models incorporating positional information: an application to transcription factor binding sites
title_full_unstemmed Finding sequence motifs with Bayesian models incorporating positional information: an application to transcription factor binding sites
title_short Finding sequence motifs with Bayesian models incorporating positional information: an application to transcription factor binding sites
title_sort finding sequence motifs with bayesian models incorporating positional information: an application to transcription factor binding sites
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2432075/
https://www.ncbi.nlm.nih.gov/pubmed/18533028
http://dx.doi.org/10.1186/1471-2105-9-262
work_keys_str_mv AT kimnakkyeong findingsequencemotifswithbayesianmodelsincorporatingpositionalinformationanapplicationtotranscriptionfactorbindingsites
AT tharakaramankannan findingsequencemotifswithbayesianmodelsincorporatingpositionalinformationanapplicationtotranscriptionfactorbindingsites
AT marinoramirezleonardo findingsequencemotifswithbayesianmodelsincorporatingpositionalinformationanapplicationtotranscriptionfactorbindingsites
AT spougejohnl findingsequencemotifswithbayesianmodelsincorporatingpositionalinformationanapplicationtotranscriptionfactorbindingsites