Cargando…
A graph-based motif detection algorithm models complex nucleotide dependencies in transcription factor binding sites
Given a set of known binding sites for a specific transcription factor, it is possible to build a model of the transcription factor binding site, usually called a motif model, and use this model to search for other sites that bind the same transcription factor. Typically, this search is performed us...
Autores principales: | , , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2006
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1635261/ https://www.ncbi.nlm.nih.gov/pubmed/17041233 http://dx.doi.org/10.1093/nar/gkl585 |
_version_ | 1782130672488415232 |
---|---|
author | Naughton, Brian T. Fratkin, Eugene Batzoglou, Serafim Brutlag, Douglas L. |
author_facet | Naughton, Brian T. Fratkin, Eugene Batzoglou, Serafim Brutlag, Douglas L. |
author_sort | Naughton, Brian T. |
collection | PubMed |
description | Given a set of known binding sites for a specific transcription factor, it is possible to build a model of the transcription factor binding site, usually called a motif model, and use this model to search for other sites that bind the same transcription factor. Typically, this search is performed using a position-specific scoring matrix (PSSM), also known as a position weight matrix. In this paper we analyze a set of eukaryotic transcription factor binding sites and show that there is extensive clustering of similar k-mers in eukaryotic motifs, owing to both functional and evolutionary constraints. The apparent limitations of probabilistic models in representing complex nucleotide dependencies lead us to a graph-based representation of motifs. When deciding whether a candidate k-mer is part of a motif or not, we base our decision not on how well the k-mer conforms to a model of the motif as a whole, but how similar it is to specific, known k-mers in the motif. We elucidate the reasons why we expect graph-based methods to perform well on motif data. Our MotifScan algorithm shows greatly improved performance over the prevalent PSSM-based method for the detection of eukaryotic motifs. |
format | Text |
id | pubmed-1635261 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2006 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-16352612006-12-26 A graph-based motif detection algorithm models complex nucleotide dependencies in transcription factor binding sites Naughton, Brian T. Fratkin, Eugene Batzoglou, Serafim Brutlag, Douglas L. Nucleic Acids Res Computational Biology Given a set of known binding sites for a specific transcription factor, it is possible to build a model of the transcription factor binding site, usually called a motif model, and use this model to search for other sites that bind the same transcription factor. Typically, this search is performed using a position-specific scoring matrix (PSSM), also known as a position weight matrix. In this paper we analyze a set of eukaryotic transcription factor binding sites and show that there is extensive clustering of similar k-mers in eukaryotic motifs, owing to both functional and evolutionary constraints. The apparent limitations of probabilistic models in representing complex nucleotide dependencies lead us to a graph-based representation of motifs. When deciding whether a candidate k-mer is part of a motif or not, we base our decision not on how well the k-mer conforms to a model of the motif as a whole, but how similar it is to specific, known k-mers in the motif. We elucidate the reasons why we expect graph-based methods to perform well on motif data. Our MotifScan algorithm shows greatly improved performance over the prevalent PSSM-based method for the detection of eukaryotic motifs. Oxford University Press 2006-11 2006-11-13 /pmc/articles/PMC1635261/ /pubmed/17041233 http://dx.doi.org/10.1093/nar/gkl585 Text en © 2006 The Author(s) |
spellingShingle | Computational Biology Naughton, Brian T. Fratkin, Eugene Batzoglou, Serafim Brutlag, Douglas L. A graph-based motif detection algorithm models complex nucleotide dependencies in transcription factor binding sites |
title | A graph-based motif detection algorithm models complex nucleotide dependencies in transcription factor binding sites |
title_full | A graph-based motif detection algorithm models complex nucleotide dependencies in transcription factor binding sites |
title_fullStr | A graph-based motif detection algorithm models complex nucleotide dependencies in transcription factor binding sites |
title_full_unstemmed | A graph-based motif detection algorithm models complex nucleotide dependencies in transcription factor binding sites |
title_short | A graph-based motif detection algorithm models complex nucleotide dependencies in transcription factor binding sites |
title_sort | graph-based motif detection algorithm models complex nucleotide dependencies in transcription factor binding sites |
topic | Computational Biology |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1635261/ https://www.ncbi.nlm.nih.gov/pubmed/17041233 http://dx.doi.org/10.1093/nar/gkl585 |
work_keys_str_mv | AT naughtonbriant agraphbasedmotifdetectionalgorithmmodelscomplexnucleotidedependenciesintranscriptionfactorbindingsites AT fratkineugene agraphbasedmotifdetectionalgorithmmodelscomplexnucleotidedependenciesintranscriptionfactorbindingsites AT batzoglouserafim agraphbasedmotifdetectionalgorithmmodelscomplexnucleotidedependenciesintranscriptionfactorbindingsites AT brutlagdouglasl agraphbasedmotifdetectionalgorithmmodelscomplexnucleotidedependenciesintranscriptionfactorbindingsites AT naughtonbriant graphbasedmotifdetectionalgorithmmodelscomplexnucleotidedependenciesintranscriptionfactorbindingsites AT fratkineugene graphbasedmotifdetectionalgorithmmodelscomplexnucleotidedependenciesintranscriptionfactorbindingsites AT batzoglouserafim graphbasedmotifdetectionalgorithmmodelscomplexnucleotidedependenciesintranscriptionfactorbindingsites AT brutlagdouglasl graphbasedmotifdetectionalgorithmmodelscomplexnucleotidedependenciesintranscriptionfactorbindingsites |