Cargando…

A Novel Bayesian DNA Motif Comparison Method for Clustering and Retrieval

Characterizing the DNA-binding specificities of transcription factors is a key problem in computational biology that has been addressed by multiple algorithms. These usually take as input sequences that are putatively bound by the same factor and output one or more DNA motifs. A common practice is t...

Descripción completa

Detalles Bibliográficos
Autores principales: Habib, Naomi, Kaplan, Tommy, Margalit, Hanah, Friedman, Nir
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2265534/
https://www.ncbi.nlm.nih.gov/pubmed/18463706
http://dx.doi.org/10.1371/journal.pcbi.1000010
_version_ 1782151489650688000
author Habib, Naomi
Kaplan, Tommy
Margalit, Hanah
Friedman, Nir
author_facet Habib, Naomi
Kaplan, Tommy
Margalit, Hanah
Friedman, Nir
author_sort Habib, Naomi
collection PubMed
description Characterizing the DNA-binding specificities of transcription factors is a key problem in computational biology that has been addressed by multiple algorithms. These usually take as input sequences that are putatively bound by the same factor and output one or more DNA motifs. A common practice is to apply several such algorithms simultaneously to improve coverage at the price of redundancy. In interpreting such results, two tasks are crucial: clustering of redundant motifs, and attributing the motifs to transcription factors by retrieval of similar motifs from previously characterized motif libraries. Both tasks inherently involve motif comparison. Here we present a novel method for comparing and merging motifs, based on Bayesian probabilistic principles. This method takes into account both the similarity in positional nucleotide distributions of the two motifs and their dissimilarity to the background distribution. We demonstrate the use of the new comparison method as a basis for motif clustering and retrieval procedures, and compare it to several commonly used alternatives. Our results show that the new method outperforms other available methods in accuracy and sensitivity. We incorporated the resulting motif clustering and retrieval procedures in a large-scale automated pipeline for analyzing DNA motifs. This pipeline integrates the results of various DNA motif discovery algorithms and automatically merges redundant motifs from multiple training sets into a coherent annotated library of motifs. Application of this pipeline to recent genome-wide transcription factor location data in S. cerevisiae successfully identified DNA motifs in a manner that is as good as semi-automated analysis reported in the literature. Moreover, we show how this analysis elucidates the mechanisms of condition-specific preferences of transcription factors.
format Text
id pubmed-2265534
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-22655342008-03-08 A Novel Bayesian DNA Motif Comparison Method for Clustering and Retrieval Habib, Naomi Kaplan, Tommy Margalit, Hanah Friedman, Nir PLoS Comput Biol Research Article Characterizing the DNA-binding specificities of transcription factors is a key problem in computational biology that has been addressed by multiple algorithms. These usually take as input sequences that are putatively bound by the same factor and output one or more DNA motifs. A common practice is to apply several such algorithms simultaneously to improve coverage at the price of redundancy. In interpreting such results, two tasks are crucial: clustering of redundant motifs, and attributing the motifs to transcription factors by retrieval of similar motifs from previously characterized motif libraries. Both tasks inherently involve motif comparison. Here we present a novel method for comparing and merging motifs, based on Bayesian probabilistic principles. This method takes into account both the similarity in positional nucleotide distributions of the two motifs and their dissimilarity to the background distribution. We demonstrate the use of the new comparison method as a basis for motif clustering and retrieval procedures, and compare it to several commonly used alternatives. Our results show that the new method outperforms other available methods in accuracy and sensitivity. We incorporated the resulting motif clustering and retrieval procedures in a large-scale automated pipeline for analyzing DNA motifs. This pipeline integrates the results of various DNA motif discovery algorithms and automatically merges redundant motifs from multiple training sets into a coherent annotated library of motifs. Application of this pipeline to recent genome-wide transcription factor location data in S. cerevisiae successfully identified DNA motifs in a manner that is as good as semi-automated analysis reported in the literature. Moreover, we show how this analysis elucidates the mechanisms of condition-specific preferences of transcription factors. Public Library of Science 2008-02-29 /pmc/articles/PMC2265534/ /pubmed/18463706 http://dx.doi.org/10.1371/journal.pcbi.1000010 Text en Habib et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Habib, Naomi
Kaplan, Tommy
Margalit, Hanah
Friedman, Nir
A Novel Bayesian DNA Motif Comparison Method for Clustering and Retrieval
title A Novel Bayesian DNA Motif Comparison Method for Clustering and Retrieval
title_full A Novel Bayesian DNA Motif Comparison Method for Clustering and Retrieval
title_fullStr A Novel Bayesian DNA Motif Comparison Method for Clustering and Retrieval
title_full_unstemmed A Novel Bayesian DNA Motif Comparison Method for Clustering and Retrieval
title_short A Novel Bayesian DNA Motif Comparison Method for Clustering and Retrieval
title_sort novel bayesian dna motif comparison method for clustering and retrieval
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2265534/
https://www.ncbi.nlm.nih.gov/pubmed/18463706
http://dx.doi.org/10.1371/journal.pcbi.1000010
work_keys_str_mv AT habibnaomi anovelbayesiandnamotifcomparisonmethodforclusteringandretrieval
AT kaplantommy anovelbayesiandnamotifcomparisonmethodforclusteringandretrieval
AT margalithanah anovelbayesiandnamotifcomparisonmethodforclusteringandretrieval
AT friedmannir anovelbayesiandnamotifcomparisonmethodforclusteringandretrieval
AT habibnaomi novelbayesiandnamotifcomparisonmethodforclusteringandretrieval
AT kaplantommy novelbayesiandnamotifcomparisonmethodforclusteringandretrieval
AT margalithanah novelbayesiandnamotifcomparisonmethodforclusteringandretrieval
AT friedmannir novelbayesiandnamotifcomparisonmethodforclusteringandretrieval