Cargando…

Refining motifs by improving information content scores using neighborhood profile search

The main goal of the motif finding problem is to detect novel, over-represented unknown signals in a set of sequences (e.g. transcription factor binding sites in a genome). The most widely used algorithms for finding motifs obtain a generative probabilistic representation of these over-represented s...

Descripción completa

Detalles Bibliográficos
Autores principales: Reddy, Chandan K, Weng, Yao-Chung, Chiang, Hsiao-Dong
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2006
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1687185/
https://www.ncbi.nlm.nih.gov/pubmed/17129371
http://dx.doi.org/10.1186/1748-7188-1-23
_version_ 1782131186402852864
author Reddy, Chandan K
Weng, Yao-Chung
Chiang, Hsiao-Dong
author_facet Reddy, Chandan K
Weng, Yao-Chung
Chiang, Hsiao-Dong
author_sort Reddy, Chandan K
collection PubMed
description The main goal of the motif finding problem is to detect novel, over-represented unknown signals in a set of sequences (e.g. transcription factor binding sites in a genome). The most widely used algorithms for finding motifs obtain a generative probabilistic representation of these over-represented signals and try to discover profiles that maximize the information content score. Although these profiles form a very powerful representation of the signals, the major difficulty arises from the fact that the best motif corresponds to the global maximum of a non-convex continuous function. Popular algorithms like Expectation Maximization (EM) and Gibbs sampling tend to be very sensitive to the initial guesses and are known to converge to the nearest local maximum very quickly. In order to improve the quality of the results, EM is used with multiple random starts or any other powerful stochastic global methods that might yield promising initial guesses (like projection algorithms). Global methods do not necessarily give initial guesses in the convergence region of the best local maximum but rather suggest that a promising solution is in the neighborhood region. In this paper, we introduce a novel optimization framework that searches the neighborhood regions of the initial alignment in a systematic manner to explore the multiple local optimal solutions. This effective search is achieved by transforming the original optimization problem into its corresponding dynamical system and estimating the practical stability boundary of the local maximum. Our results show that the popularly used EM algorithm often converges to sub-optimal solutions which can be significantly improved by the proposed neighborhood profile search. Based on experiments using both synthetic and real datasets, our method demonstrates significant improvements in the information content scores of the probabilistic models. The proposed method also gives the flexibility in using different local solvers and global methods depending on their suitability for some specific datasets.
format Text
id pubmed-1687185
institution National Center for Biotechnology Information
language English
publishDate 2006
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-16871852006-12-19 Refining motifs by improving information content scores using neighborhood profile search Reddy, Chandan K Weng, Yao-Chung Chiang, Hsiao-Dong Algorithms Mol Biol Research The main goal of the motif finding problem is to detect novel, over-represented unknown signals in a set of sequences (e.g. transcription factor binding sites in a genome). The most widely used algorithms for finding motifs obtain a generative probabilistic representation of these over-represented signals and try to discover profiles that maximize the information content score. Although these profiles form a very powerful representation of the signals, the major difficulty arises from the fact that the best motif corresponds to the global maximum of a non-convex continuous function. Popular algorithms like Expectation Maximization (EM) and Gibbs sampling tend to be very sensitive to the initial guesses and are known to converge to the nearest local maximum very quickly. In order to improve the quality of the results, EM is used with multiple random starts or any other powerful stochastic global methods that might yield promising initial guesses (like projection algorithms). Global methods do not necessarily give initial guesses in the convergence region of the best local maximum but rather suggest that a promising solution is in the neighborhood region. In this paper, we introduce a novel optimization framework that searches the neighborhood regions of the initial alignment in a systematic manner to explore the multiple local optimal solutions. This effective search is achieved by transforming the original optimization problem into its corresponding dynamical system and estimating the practical stability boundary of the local maximum. Our results show that the popularly used EM algorithm often converges to sub-optimal solutions which can be significantly improved by the proposed neighborhood profile search. Based on experiments using both synthetic and real datasets, our method demonstrates significant improvements in the information content scores of the probabilistic models. The proposed method also gives the flexibility in using different local solvers and global methods depending on their suitability for some specific datasets. BioMed Central 2006-11-27 /pmc/articles/PMC1687185/ /pubmed/17129371 http://dx.doi.org/10.1186/1748-7188-1-23 Text en Copyright © 2006 Reddy et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Reddy, Chandan K
Weng, Yao-Chung
Chiang, Hsiao-Dong
Refining motifs by improving information content scores using neighborhood profile search
title Refining motifs by improving information content scores using neighborhood profile search
title_full Refining motifs by improving information content scores using neighborhood profile search
title_fullStr Refining motifs by improving information content scores using neighborhood profile search
title_full_unstemmed Refining motifs by improving information content scores using neighborhood profile search
title_short Refining motifs by improving information content scores using neighborhood profile search
title_sort refining motifs by improving information content scores using neighborhood profile search
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1687185/
https://www.ncbi.nlm.nih.gov/pubmed/17129371
http://dx.doi.org/10.1186/1748-7188-1-23
work_keys_str_mv AT reddychandank refiningmotifsbyimprovinginformationcontentscoresusingneighborhoodprofilesearch
AT wengyaochung refiningmotifsbyimprovinginformationcontentscoresusingneighborhoodprofilesearch
AT chianghsiaodong refiningmotifsbyimprovinginformationcontentscoresusingneighborhoodprofilesearch