Cargando…
Discriminative motif discovery in DNA and protein sequences using the DEME algorithm
BACKGROUND: Motif discovery aims to detect short, highly conserved patterns in a collection of unaligned DNA or protein sequences. Discriminative motif finding algorithms aim to increase the sensitivity and selectivity of motif discovery by utilizing a second set of sequences, and searching only for...
Autores principales: | , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2007
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2194741/ https://www.ncbi.nlm.nih.gov/pubmed/17937785 http://dx.doi.org/10.1186/1471-2105-8-385 |
_version_ | 1782147686599753728 |
---|---|
author | Redhead, Emma Bailey, Timothy L |
author_facet | Redhead, Emma Bailey, Timothy L |
author_sort | Redhead, Emma |
collection | PubMed |
description | BACKGROUND: Motif discovery aims to detect short, highly conserved patterns in a collection of unaligned DNA or protein sequences. Discriminative motif finding algorithms aim to increase the sensitivity and selectivity of motif discovery by utilizing a second set of sequences, and searching only for patterns that can differentiate the two sets of sequences. Potential applications of discriminative motif discovery include discovering transcription factor binding site motifs in ChIP-chip data and finding protein motifs involved in thermal stability using sets of orthologous proteins from thermophilic and mesophilic organisms. RESULTS: We describe DEME, a discriminative motif discovery algorithm for use with protein and DNA sequences. Input to DEME is two sets of sequences; a "positive" set and a "negative" set. DEME represents motifs using a probabilistic model, and uses a novel combination of global and local search to find the motif that optimally discriminates between the two sets of sequences. DEME is unique among discriminative motif finders in that it uses an informative Bayesian prior on protein motif columns, allowing it to incorporate prior knowledge of residue characteristics. We also introduce four, synthetic, discriminative motif discovery problems that are designed for evaluating discriminative motif finders in various biologically motivated contexts. We test DEME using these synthetic problems and on two biological problems: finding yeast transcription factor binding motifs in ChIP-chip data, and finding motifs that discriminate between groups of thermophilic and mesophilic orthologous proteins. CONCLUSION: Using artificial data, we show that DEME is more effective than a non-discriminative approach when there are "decoy" motifs or when a variant of the motif is present in the "negative" sequences. With real data, we show that DEME is as good, but not better than non-discriminative algorithms at discovering yeast transcription factor binding motifs. We also show that DEME can find highly informative thermal-stability protein motifs. Binaries for the stand-alone program DEME is free for academic use and is available at |
format | Text |
id | pubmed-2194741 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2007 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-21947412008-01-14 Discriminative motif discovery in DNA and protein sequences using the DEME algorithm Redhead, Emma Bailey, Timothy L BMC Bioinformatics Research Article BACKGROUND: Motif discovery aims to detect short, highly conserved patterns in a collection of unaligned DNA or protein sequences. Discriminative motif finding algorithms aim to increase the sensitivity and selectivity of motif discovery by utilizing a second set of sequences, and searching only for patterns that can differentiate the two sets of sequences. Potential applications of discriminative motif discovery include discovering transcription factor binding site motifs in ChIP-chip data and finding protein motifs involved in thermal stability using sets of orthologous proteins from thermophilic and mesophilic organisms. RESULTS: We describe DEME, a discriminative motif discovery algorithm for use with protein and DNA sequences. Input to DEME is two sets of sequences; a "positive" set and a "negative" set. DEME represents motifs using a probabilistic model, and uses a novel combination of global and local search to find the motif that optimally discriminates between the two sets of sequences. DEME is unique among discriminative motif finders in that it uses an informative Bayesian prior on protein motif columns, allowing it to incorporate prior knowledge of residue characteristics. We also introduce four, synthetic, discriminative motif discovery problems that are designed for evaluating discriminative motif finders in various biologically motivated contexts. We test DEME using these synthetic problems and on two biological problems: finding yeast transcription factor binding motifs in ChIP-chip data, and finding motifs that discriminate between groups of thermophilic and mesophilic orthologous proteins. CONCLUSION: Using artificial data, we show that DEME is more effective than a non-discriminative approach when there are "decoy" motifs or when a variant of the motif is present in the "negative" sequences. With real data, we show that DEME is as good, but not better than non-discriminative algorithms at discovering yeast transcription factor binding motifs. We also show that DEME can find highly informative thermal-stability protein motifs. Binaries for the stand-alone program DEME is free for academic use and is available at BioMed Central 2007-10-15 /pmc/articles/PMC2194741/ /pubmed/17937785 http://dx.doi.org/10.1186/1471-2105-8-385 Text en Copyright © 2007 Redhead and Bailey.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Redhead, Emma Bailey, Timothy L Discriminative motif discovery in DNA and protein sequences using the DEME algorithm |
title | Discriminative motif discovery in DNA and protein sequences using the DEME algorithm |
title_full | Discriminative motif discovery in DNA and protein sequences using the DEME algorithm |
title_fullStr | Discriminative motif discovery in DNA and protein sequences using the DEME algorithm |
title_full_unstemmed | Discriminative motif discovery in DNA and protein sequences using the DEME algorithm |
title_short | Discriminative motif discovery in DNA and protein sequences using the DEME algorithm |
title_sort | discriminative motif discovery in dna and protein sequences using the deme algorithm |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2194741/ https://www.ncbi.nlm.nih.gov/pubmed/17937785 http://dx.doi.org/10.1186/1471-2105-8-385 |
work_keys_str_mv | AT redheademma discriminativemotifdiscoveryindnaandproteinsequencesusingthedemealgorithm AT baileytimothyl discriminativemotifdiscoveryindnaandproteinsequencesusingthedemealgorithm |