Cargando…
A Monte Carlo-based framework enhances the discovery and interpretation of regulatory sequence motifs
BACKGROUND: Discovery of functionally significant short, statistically overrepresented subsequence patterns (motifs) in a set of sequences is a challenging problem in bioinformatics. Oftentimes, not all sequences in the set contain a motif. These non-motif-containing sequences complicate the algorit...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2012
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3542263/ https://www.ncbi.nlm.nih.gov/pubmed/23181585 http://dx.doi.org/10.1186/1471-2105-13-317 |
_version_ | 1782255480425414656 |
---|---|
author | Seitzer, Phillip Wilbanks, Elizabeth G Larsen, David J Facciotti, Marc T |
author_facet | Seitzer, Phillip Wilbanks, Elizabeth G Larsen, David J Facciotti, Marc T |
author_sort | Seitzer, Phillip |
collection | PubMed |
description | BACKGROUND: Discovery of functionally significant short, statistically overrepresented subsequence patterns (motifs) in a set of sequences is a challenging problem in bioinformatics. Oftentimes, not all sequences in the set contain a motif. These non-motif-containing sequences complicate the algorithmic discovery of motifs. Filtering the non-motif-containing sequences from the larger set of sequences while simultaneously determining the identity of the motif is, therefore, desirable and a non-trivial problem in motif discovery research. RESULTS: We describe MotifCatcher, a framework that extends the sensitivity of existing motif-finding tools by employing random sampling to effectively remove non-motif-containing sequences from the motif search. We developed two implementations of our algorithm; each built around a commonly used motif-finding tool, and applied our algorithm to three diverse chromatin immunoprecipitation (ChIP) data sets. In each case, the motif finder with the MotifCatcher extension demonstrated improved sensitivity over the motif finder alone. Our approach organizes candidate functionally significant discovered motifs into a tree, which allowed us to make additional insights. In all cases, we were able to support our findings with experimental work from the literature. CONCLUSIONS: Our framework demonstrates that additional processing at the sequence entry level can significantly improve the performance of existing motif-finding tools. For each biological data set tested, we were able to propose novel biological hypotheses supported by experimental work from the literature. Specifically, in Escherichia coli, we suggested binding site motifs for 6 non-traditional LexA protein binding sites; in Saccharomyces cerevisiae, we hypothesize 2 disparate mechanisms for novel binding sites of the Cse4p protein; and in Halobacterium sp. NRC-1, we discoverd subtle differences in a general transcription factor (GTF) binding site motif across several data sets. We suggest that small differences in our discovered motif could confer specificity for one or more homologous GTF proteins. We offer a free implementation of the MotifCatcher software package at http://www.bme.ucdavis.edu/facciotti/resources_data/software/. |
format | Online Article Text |
id | pubmed-3542263 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2012 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-35422632013-01-11 A Monte Carlo-based framework enhances the discovery and interpretation of regulatory sequence motifs Seitzer, Phillip Wilbanks, Elizabeth G Larsen, David J Facciotti, Marc T BMC Bioinformatics Research Article BACKGROUND: Discovery of functionally significant short, statistically overrepresented subsequence patterns (motifs) in a set of sequences is a challenging problem in bioinformatics. Oftentimes, not all sequences in the set contain a motif. These non-motif-containing sequences complicate the algorithmic discovery of motifs. Filtering the non-motif-containing sequences from the larger set of sequences while simultaneously determining the identity of the motif is, therefore, desirable and a non-trivial problem in motif discovery research. RESULTS: We describe MotifCatcher, a framework that extends the sensitivity of existing motif-finding tools by employing random sampling to effectively remove non-motif-containing sequences from the motif search. We developed two implementations of our algorithm; each built around a commonly used motif-finding tool, and applied our algorithm to three diverse chromatin immunoprecipitation (ChIP) data sets. In each case, the motif finder with the MotifCatcher extension demonstrated improved sensitivity over the motif finder alone. Our approach organizes candidate functionally significant discovered motifs into a tree, which allowed us to make additional insights. In all cases, we were able to support our findings with experimental work from the literature. CONCLUSIONS: Our framework demonstrates that additional processing at the sequence entry level can significantly improve the performance of existing motif-finding tools. For each biological data set tested, we were able to propose novel biological hypotheses supported by experimental work from the literature. Specifically, in Escherichia coli, we suggested binding site motifs for 6 non-traditional LexA protein binding sites; in Saccharomyces cerevisiae, we hypothesize 2 disparate mechanisms for novel binding sites of the Cse4p protein; and in Halobacterium sp. NRC-1, we discoverd subtle differences in a general transcription factor (GTF) binding site motif across several data sets. We suggest that small differences in our discovered motif could confer specificity for one or more homologous GTF proteins. We offer a free implementation of the MotifCatcher software package at http://www.bme.ucdavis.edu/facciotti/resources_data/software/. BioMed Central 2012-11-27 /pmc/articles/PMC3542263/ /pubmed/23181585 http://dx.doi.org/10.1186/1471-2105-13-317 Text en Copyright ©2012 Seitzer et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Seitzer, Phillip Wilbanks, Elizabeth G Larsen, David J Facciotti, Marc T A Monte Carlo-based framework enhances the discovery and interpretation of regulatory sequence motifs |
title | A Monte Carlo-based framework enhances the discovery and interpretation of regulatory sequence motifs |
title_full | A Monte Carlo-based framework enhances the discovery and interpretation of regulatory sequence motifs |
title_fullStr | A Monte Carlo-based framework enhances the discovery and interpretation of regulatory sequence motifs |
title_full_unstemmed | A Monte Carlo-based framework enhances the discovery and interpretation of regulatory sequence motifs |
title_short | A Monte Carlo-based framework enhances the discovery and interpretation of regulatory sequence motifs |
title_sort | monte carlo-based framework enhances the discovery and interpretation of regulatory sequence motifs |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3542263/ https://www.ncbi.nlm.nih.gov/pubmed/23181585 http://dx.doi.org/10.1186/1471-2105-13-317 |
work_keys_str_mv | AT seitzerphillip amontecarlobasedframeworkenhancesthediscoveryandinterpretationofregulatorysequencemotifs AT wilbankselizabethg amontecarlobasedframeworkenhancesthediscoveryandinterpretationofregulatorysequencemotifs AT larsendavidj amontecarlobasedframeworkenhancesthediscoveryandinterpretationofregulatorysequencemotifs AT facciottimarct amontecarlobasedframeworkenhancesthediscoveryandinterpretationofregulatorysequencemotifs AT seitzerphillip montecarlobasedframeworkenhancesthediscoveryandinterpretationofregulatorysequencemotifs AT wilbankselizabethg montecarlobasedframeworkenhancesthediscoveryandinterpretationofregulatorysequencemotifs AT larsendavidj montecarlobasedframeworkenhancesthediscoveryandinterpretationofregulatorysequencemotifs AT facciottimarct montecarlobasedframeworkenhancesthediscoveryandinterpretationofregulatorysequencemotifs |