Cargando…

PhyME: A probabilistic algorithm for finding motifs in sets of orthologous sequences

BACKGROUND: This paper addresses the problem of discovering transcription factor binding sites in heterogeneous sequence data, which includes regulatory sequences of one or more genes, as well as their orthologs in other species. RESULTS: We propose an algorithm that integrates two important aspects...

Descripción completa

Detalles Bibliográficos
Autores principales: Sinha, Saurabh, Blanchette, Mathieu, Tompa, Martin
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2004
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC534098/
https://www.ncbi.nlm.nih.gov/pubmed/15511292
http://dx.doi.org/10.1186/1471-2105-5-170
_version_ 1782121998217904128
author Sinha, Saurabh
Blanchette, Mathieu
Tompa, Martin
author_facet Sinha, Saurabh
Blanchette, Mathieu
Tompa, Martin
author_sort Sinha, Saurabh
collection PubMed
description BACKGROUND: This paper addresses the problem of discovering transcription factor binding sites in heterogeneous sequence data, which includes regulatory sequences of one or more genes, as well as their orthologs in other species. RESULTS: We propose an algorithm that integrates two important aspects of a motif's significance – overrepresentation and cross-species conservation – into one probabilistic score. The algorithm allows the input orthologous sequences to be related by any user-specified phylogenetic tree. It is based on the Expectation-Maximization technique, and scales well with the number of species and the length of input sequences. We evaluate the algorithm on synthetic data, and also present results for data sets from yeast, fly, and human. CONCLUSIONS: The results demonstrate that the new approach improves motif discovery by exploiting multiple species information.
format Text
id pubmed-534098
institution National Center for Biotechnology Information
language English
publishDate 2004
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-5340982004-11-28 PhyME: A probabilistic algorithm for finding motifs in sets of orthologous sequences Sinha, Saurabh Blanchette, Mathieu Tompa, Martin BMC Bioinformatics Research Article BACKGROUND: This paper addresses the problem of discovering transcription factor binding sites in heterogeneous sequence data, which includes regulatory sequences of one or more genes, as well as their orthologs in other species. RESULTS: We propose an algorithm that integrates two important aspects of a motif's significance – overrepresentation and cross-species conservation – into one probabilistic score. The algorithm allows the input orthologous sequences to be related by any user-specified phylogenetic tree. It is based on the Expectation-Maximization technique, and scales well with the number of species and the length of input sequences. We evaluate the algorithm on synthetic data, and also present results for data sets from yeast, fly, and human. CONCLUSIONS: The results demonstrate that the new approach improves motif discovery by exploiting multiple species information. BioMed Central 2004-10-28 /pmc/articles/PMC534098/ /pubmed/15511292 http://dx.doi.org/10.1186/1471-2105-5-170 Text en Copyright © 2004 Sinha et al; licensee BioMed Central Ltd.
spellingShingle Research Article
Sinha, Saurabh
Blanchette, Mathieu
Tompa, Martin
PhyME: A probabilistic algorithm for finding motifs in sets of orthologous sequences
title PhyME: A probabilistic algorithm for finding motifs in sets of orthologous sequences
title_full PhyME: A probabilistic algorithm for finding motifs in sets of orthologous sequences
title_fullStr PhyME: A probabilistic algorithm for finding motifs in sets of orthologous sequences
title_full_unstemmed PhyME: A probabilistic algorithm for finding motifs in sets of orthologous sequences
title_short PhyME: A probabilistic algorithm for finding motifs in sets of orthologous sequences
title_sort phyme: a probabilistic algorithm for finding motifs in sets of orthologous sequences
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC534098/
https://www.ncbi.nlm.nih.gov/pubmed/15511292
http://dx.doi.org/10.1186/1471-2105-5-170
work_keys_str_mv AT sinhasaurabh phymeaprobabilisticalgorithmforfindingmotifsinsetsoforthologoussequences
AT blanchettemathieu phymeaprobabilisticalgorithmforfindingmotifsinsetsoforthologoussequences
AT tompamartin phymeaprobabilisticalgorithmforfindingmotifsinsetsoforthologoussequences