Cargando…
Refining motifs by improving information content scores using neighborhood profile search
The main goal of the motif finding problem is to detect novel, over-represented unknown signals in a set of sequences (e.g. transcription factor binding sites in a genome). The most widely used algorithms for finding motifs obtain a generative probabilistic representation of these over-represented s...
Autores principales: | , , |
---|---|
Formato: | Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2006
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1687185/ https://www.ncbi.nlm.nih.gov/pubmed/17129371 http://dx.doi.org/10.1186/1748-7188-1-23 |
_version_ | 1782131186402852864 |
---|---|
author | Reddy, Chandan K Weng, Yao-Chung Chiang, Hsiao-Dong |
author_facet | Reddy, Chandan K Weng, Yao-Chung Chiang, Hsiao-Dong |
author_sort | Reddy, Chandan K |
collection | PubMed |
description | The main goal of the motif finding problem is to detect novel, over-represented unknown signals in a set of sequences (e.g. transcription factor binding sites in a genome). The most widely used algorithms for finding motifs obtain a generative probabilistic representation of these over-represented signals and try to discover profiles that maximize the information content score. Although these profiles form a very powerful representation of the signals, the major difficulty arises from the fact that the best motif corresponds to the global maximum of a non-convex continuous function. Popular algorithms like Expectation Maximization (EM) and Gibbs sampling tend to be very sensitive to the initial guesses and are known to converge to the nearest local maximum very quickly. In order to improve the quality of the results, EM is used with multiple random starts or any other powerful stochastic global methods that might yield promising initial guesses (like projection algorithms). Global methods do not necessarily give initial guesses in the convergence region of the best local maximum but rather suggest that a promising solution is in the neighborhood region. In this paper, we introduce a novel optimization framework that searches the neighborhood regions of the initial alignment in a systematic manner to explore the multiple local optimal solutions. This effective search is achieved by transforming the original optimization problem into its corresponding dynamical system and estimating the practical stability boundary of the local maximum. Our results show that the popularly used EM algorithm often converges to sub-optimal solutions which can be significantly improved by the proposed neighborhood profile search. Based on experiments using both synthetic and real datasets, our method demonstrates significant improvements in the information content scores of the probabilistic models. The proposed method also gives the flexibility in using different local solvers and global methods depending on their suitability for some specific datasets. |
format | Text |
id | pubmed-1687185 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2006 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-16871852006-12-19 Refining motifs by improving information content scores using neighborhood profile search Reddy, Chandan K Weng, Yao-Chung Chiang, Hsiao-Dong Algorithms Mol Biol Research The main goal of the motif finding problem is to detect novel, over-represented unknown signals in a set of sequences (e.g. transcription factor binding sites in a genome). The most widely used algorithms for finding motifs obtain a generative probabilistic representation of these over-represented signals and try to discover profiles that maximize the information content score. Although these profiles form a very powerful representation of the signals, the major difficulty arises from the fact that the best motif corresponds to the global maximum of a non-convex continuous function. Popular algorithms like Expectation Maximization (EM) and Gibbs sampling tend to be very sensitive to the initial guesses and are known to converge to the nearest local maximum very quickly. In order to improve the quality of the results, EM is used with multiple random starts or any other powerful stochastic global methods that might yield promising initial guesses (like projection algorithms). Global methods do not necessarily give initial guesses in the convergence region of the best local maximum but rather suggest that a promising solution is in the neighborhood region. In this paper, we introduce a novel optimization framework that searches the neighborhood regions of the initial alignment in a systematic manner to explore the multiple local optimal solutions. This effective search is achieved by transforming the original optimization problem into its corresponding dynamical system and estimating the practical stability boundary of the local maximum. Our results show that the popularly used EM algorithm often converges to sub-optimal solutions which can be significantly improved by the proposed neighborhood profile search. Based on experiments using both synthetic and real datasets, our method demonstrates significant improvements in the information content scores of the probabilistic models. The proposed method also gives the flexibility in using different local solvers and global methods depending on their suitability for some specific datasets. BioMed Central 2006-11-27 /pmc/articles/PMC1687185/ /pubmed/17129371 http://dx.doi.org/10.1186/1748-7188-1-23 Text en Copyright © 2006 Reddy et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Reddy, Chandan K Weng, Yao-Chung Chiang, Hsiao-Dong Refining motifs by improving information content scores using neighborhood profile search |
title | Refining motifs by improving information content scores using neighborhood profile search |
title_full | Refining motifs by improving information content scores using neighborhood profile search |
title_fullStr | Refining motifs by improving information content scores using neighborhood profile search |
title_full_unstemmed | Refining motifs by improving information content scores using neighborhood profile search |
title_short | Refining motifs by improving information content scores using neighborhood profile search |
title_sort | refining motifs by improving information content scores using neighborhood profile search |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1687185/ https://www.ncbi.nlm.nih.gov/pubmed/17129371 http://dx.doi.org/10.1186/1748-7188-1-23 |
work_keys_str_mv | AT reddychandank refiningmotifsbyimprovinginformationcontentscoresusingneighborhoodprofilesearch AT wengyaochung refiningmotifsbyimprovinginformationcontentscoresusingneighborhoodprofilesearch AT chianghsiaodong refiningmotifsbyimprovinginformationcontentscoresusingneighborhoodprofilesearch |