Cargando…

DISCOVER: a feature-based discriminative method for motif search in complex genomes

Motivation: Identifying transcription factor binding sites (TFBSs) encoding complex regulatory signals in metazoan genomes remains a challenging problem in computational genomics. Due to degeneracy of nucleotide content among binding site instances or motifs, and intricate ‘grammatical organization’...

Descripción completa

Detalles Bibliográficos
Autores principales:	Fu, Wenjie, Ray, Pradipta, Xing, Eric P.
Formato:	Texto
Lenguaje:	English
Publicado:	Oxford University Press 2009
Materias:	Ismb/Eccb 2009 Conference Proceedings June 27 to July 2, 2009, Stockholm, Sweden
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2687984/ https://www.ncbi.nlm.nih.gov/pubmed/19478006 http://dx.doi.org/10.1093/bioinformatics/btp230

_version_	1782167635748716544
author	Fu, Wenjie Ray, Pradipta Xing, Eric P.
author_facet	Fu, Wenjie Ray, Pradipta Xing, Eric P.
author_sort	Fu, Wenjie
collection	PubMed
description	Motivation: Identifying transcription factor binding sites (TFBSs) encoding complex regulatory signals in metazoan genomes remains a challenging problem in computational genomics. Due to degeneracy of nucleotide content among binding site instances or motifs, and intricate ‘grammatical organization’ of motifs within cis-regulatory modules (CRMs), extant pattern matching-based in silico motif search methods often suffer from impractically high false positive rates, especially in the context of analyzing large genomic datasets, and noisy position weight matrices which characterize binding sites. Here, we try to address this problem by using a framework to maximally utilize the information content of the genomic DNA in the region of query, taking cues from values of various biologically meaningful genetic and epigenetic factors in the query region such as clade-specific evolutionary parameters, presence/absence of nearby coding regions, etc. We present a new method for TFBS prediction in metazoan genomes that utilizes both the CRM architecture of sequences and a variety of features of individual motifs. Our proposed approach is based on a discriminative probabilistic model known as conditional random fields that explicitly optimizes the predictive probability of motif presence in large sequences, based on the joint effect of all such features. Results: This model overcomes weaknesses in earlier methods based on less effective statistical formalisms that are sensitive to spurious signals in the data. We evaluate our method on both simulated CRMs and real Drosophila sequences in comparison with a wide spectrum of existing models, and outperform the state of the art by 22% in F1 score. Availability and Implementation: The code is publicly available at http://www.sailing.cs.cmu.edu/discover.html. Contact: epxing@cs.cmu.edu Supplementary information: Supplementary data are available at Bioinformatics online.
format	Text
id	pubmed-2687984
institution	National Center for Biotechnology Information
language	English
publishDate	2009
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-26879842009-06-02 DISCOVER: a feature-based discriminative method for motif search in complex genomes Fu, Wenjie Ray, Pradipta Xing, Eric P. Bioinformatics Ismb/Eccb 2009 Conference Proceedings June 27 to July 2, 2009, Stockholm, Sweden Motivation: Identifying transcription factor binding sites (TFBSs) encoding complex regulatory signals in metazoan genomes remains a challenging problem in computational genomics. Due to degeneracy of nucleotide content among binding site instances or motifs, and intricate ‘grammatical organization’ of motifs within cis-regulatory modules (CRMs), extant pattern matching-based in silico motif search methods often suffer from impractically high false positive rates, especially in the context of analyzing large genomic datasets, and noisy position weight matrices which characterize binding sites. Here, we try to address this problem by using a framework to maximally utilize the information content of the genomic DNA in the region of query, taking cues from values of various biologically meaningful genetic and epigenetic factors in the query region such as clade-specific evolutionary parameters, presence/absence of nearby coding regions, etc. We present a new method for TFBS prediction in metazoan genomes that utilizes both the CRM architecture of sequences and a variety of features of individual motifs. Our proposed approach is based on a discriminative probabilistic model known as conditional random fields that explicitly optimizes the predictive probability of motif presence in large sequences, based on the joint effect of all such features. Results: This model overcomes weaknesses in earlier methods based on less effective statistical formalisms that are sensitive to spurious signals in the data. We evaluate our method on both simulated CRMs and real Drosophila sequences in comparison with a wide spectrum of existing models, and outperform the state of the art by 22% in F1 score. Availability and Implementation: The code is publicly available at http://www.sailing.cs.cmu.edu/discover.html. Contact: epxing@cs.cmu.edu Supplementary information: Supplementary data are available at Bioinformatics online. Oxford University Press 2009-06-15 2009-05-27 /pmc/articles/PMC2687984/ /pubmed/19478006 http://dx.doi.org/10.1093/bioinformatics/btp230 Text en © 2009 The Author(s) http://creativecommons.org/licenses/by-nc/2.0/uk/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/2.0/uk/) which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Ismb/Eccb 2009 Conference Proceedings June 27 to July 2, 2009, Stockholm, Sweden Fu, Wenjie Ray, Pradipta Xing, Eric P. DISCOVER: a feature-based discriminative method for motif search in complex genomes
title	DISCOVER: a feature-based discriminative method for motif search in complex genomes
title_full	DISCOVER: a feature-based discriminative method for motif search in complex genomes
title_fullStr	DISCOVER: a feature-based discriminative method for motif search in complex genomes
title_full_unstemmed	DISCOVER: a feature-based discriminative method for motif search in complex genomes
title_short	DISCOVER: a feature-based discriminative method for motif search in complex genomes
title_sort	discover: a feature-based discriminative method for motif search in complex genomes
topic	Ismb/Eccb 2009 Conference Proceedings June 27 to July 2, 2009, Stockholm, Sweden
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2687984/ https://www.ncbi.nlm.nih.gov/pubmed/19478006 http://dx.doi.org/10.1093/bioinformatics/btp230
work_keys_str_mv	AT fuwenjie discoverafeaturebaseddiscriminativemethodformotifsearchincomplexgenomes AT raypradipta discoverafeaturebaseddiscriminativemethodformotifsearchincomplexgenomes AT xingericp discoverafeaturebaseddiscriminativemethodformotifsearchincomplexgenomes

DISCOVER: a feature-based discriminative method for motif search in complex genomes

Ejemplares similares