Cargando…

SEARCHPATTOOL: a new method for mining the most specific frequent patterns for binding sites with application to prokaryotic DNA sequences

BACKGROUND: Computational methods to predict transcription factor binding sites (TFBS) based on exhaustive algorithms are guaranteed to find the best patterns but are often limited to short ones or impose some constraints on the pattern type. Many patterns for binding sites in prokaryotic species ar...

Descripción completa

Detalles Bibliográficos
Autores principales: Elloumi, Fathi, Nason, Martha
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2082047/
https://www.ncbi.nlm.nih.gov/pubmed/17883842
http://dx.doi.org/10.1186/1471-2105-8-354
_version_ 1782138165302132736
author Elloumi, Fathi
Nason, Martha
author_facet Elloumi, Fathi
Nason, Martha
author_sort Elloumi, Fathi
collection PubMed
description BACKGROUND: Computational methods to predict transcription factor binding sites (TFBS) based on exhaustive algorithms are guaranteed to find the best patterns but are often limited to short ones or impose some constraints on the pattern type. Many patterns for binding sites in prokaryotic species are not well characterized but are known to be large, between 16–30 base pairs (bp) and contain at least 2 conserved bases. The length of prokaryotic species promoters (about 400 bp) and our interest in studying a small set of genes that could be a cluster of co-regulated genes from microarray experiments led to the development of a new exhaustive algorithm targeting these large patterns. RESULTS: We present Searchpattool, a new method to search for and select the most specific (conservative) frequent patterns. This method does not impose restrictions on the density or the structure of the pattern. The best patterns (motifs) are selected using several statistics, including a new application of a z-score based on the number of matching sequences. We compared Searchpattool against other well known algorithms on a Bacillus subtilis group of 14 input sequences and found that in our experiments Searchpattool always performed the best based on performance scores. CONCLUSION: Searchpattool is a new method for pattern discovery relative to transcription factor binding sites for species or genes with short promoters. It outputs the most specific significant patterns and helps the biologist to choose the best candidates.
format Text
id pubmed-2082047
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-20820472007-11-20 SEARCHPATTOOL: a new method for mining the most specific frequent patterns for binding sites with application to prokaryotic DNA sequences Elloumi, Fathi Nason, Martha BMC Bioinformatics Methodology Article BACKGROUND: Computational methods to predict transcription factor binding sites (TFBS) based on exhaustive algorithms are guaranteed to find the best patterns but are often limited to short ones or impose some constraints on the pattern type. Many patterns for binding sites in prokaryotic species are not well characterized but are known to be large, between 16–30 base pairs (bp) and contain at least 2 conserved bases. The length of prokaryotic species promoters (about 400 bp) and our interest in studying a small set of genes that could be a cluster of co-regulated genes from microarray experiments led to the development of a new exhaustive algorithm targeting these large patterns. RESULTS: We present Searchpattool, a new method to search for and select the most specific (conservative) frequent patterns. This method does not impose restrictions on the density or the structure of the pattern. The best patterns (motifs) are selected using several statistics, including a new application of a z-score based on the number of matching sequences. We compared Searchpattool against other well known algorithms on a Bacillus subtilis group of 14 input sequences and found that in our experiments Searchpattool always performed the best based on performance scores. CONCLUSION: Searchpattool is a new method for pattern discovery relative to transcription factor binding sites for species or genes with short promoters. It outputs the most specific significant patterns and helps the biologist to choose the best candidates. BioMed Central 2007-09-20 /pmc/articles/PMC2082047/ /pubmed/17883842 http://dx.doi.org/10.1186/1471-2105-8-354 Text en Copyright © 2007 Elloumi and Nason; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Elloumi, Fathi
Nason, Martha
SEARCHPATTOOL: a new method for mining the most specific frequent patterns for binding sites with application to prokaryotic DNA sequences
title SEARCHPATTOOL: a new method for mining the most specific frequent patterns for binding sites with application to prokaryotic DNA sequences
title_full SEARCHPATTOOL: a new method for mining the most specific frequent patterns for binding sites with application to prokaryotic DNA sequences
title_fullStr SEARCHPATTOOL: a new method for mining the most specific frequent patterns for binding sites with application to prokaryotic DNA sequences
title_full_unstemmed SEARCHPATTOOL: a new method for mining the most specific frequent patterns for binding sites with application to prokaryotic DNA sequences
title_short SEARCHPATTOOL: a new method for mining the most specific frequent patterns for binding sites with application to prokaryotic DNA sequences
title_sort searchpattool: a new method for mining the most specific frequent patterns for binding sites with application to prokaryotic dna sequences
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2082047/
https://www.ncbi.nlm.nih.gov/pubmed/17883842
http://dx.doi.org/10.1186/1471-2105-8-354
work_keys_str_mv AT elloumifathi searchpattoolanewmethodforminingthemostspecificfrequentpatternsforbindingsiteswithapplicationtoprokaryoticdnasequences
AT nasonmartha searchpattoolanewmethodforminingthemostspecificfrequentpatternsforbindingsiteswithapplicationtoprokaryoticdnasequences