Cargando…

rMotifGen: random motif generator for DNA and protein sequences

BACKGROUND: Detection of short, subtle conserved motif regions within a set of related DNA or amino acid sequences can lead to discoveries about important regulatory domains such as transcription factor and DNA binding sites as well as conserved protein domains. In order to help assess motif detecti...

Descripción completa

Detalles Bibliográficos
Autores principales: Rouchka, Eric C, Hardin, C Timothy
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1963340/
https://www.ncbi.nlm.nih.gov/pubmed/17683637
http://dx.doi.org/10.1186/1471-2105-8-292
_version_ 1782134640585211904
author Rouchka, Eric C
Hardin, C Timothy
author_facet Rouchka, Eric C
Hardin, C Timothy
author_sort Rouchka, Eric C
collection PubMed
description BACKGROUND: Detection of short, subtle conserved motif regions within a set of related DNA or amino acid sequences can lead to discoveries about important regulatory domains such as transcription factor and DNA binding sites as well as conserved protein domains. In order to help assess motif detection algorithms on motifs with varying properties and levels of conservation, we have developed a computational tool, rMotifGen, with the sole purpose of generating a number of random DNA or protein sequences containing short sequence motifs. Each motif consensus can be user-defined, randomly generated, or created from a position-specific scoring matrix (PSSM). Insertions and mutations within these motifs are created according to user-defined parameters and substitution matrices. The resulting sequences can be helpful in mutational simulations and in testing the limits of motif detection algorithms. RESULTS: Two implementations of rMotifGen have been created, one providing a graphical user interface (GUI) for random motif construction, and the other serving as a command line interface. The second implementation has the added advantages of platform independence and being able to be called in a batch mode. rMotifGen was used to construct sample sets of sequences containing DNA motifs and amino acid motifs that were then tested against the Gibbs sampler and MEME packages. CONCLUSION: rMotifGen provides an efficient and convenient method for creating random DNA or amino acid sequences with a variable number of motifs, where the instance of each motif can be incorporated using a position-specific scoring matrix (PSSM) or by creating an instance mutated from its corresponding consensus using an evolutionary model based on substitution matrices. rMotifGen is freely available at: .
format Text
id pubmed-1963340
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-19633402007-09-01 rMotifGen: random motif generator for DNA and protein sequences Rouchka, Eric C Hardin, C Timothy BMC Bioinformatics Software BACKGROUND: Detection of short, subtle conserved motif regions within a set of related DNA or amino acid sequences can lead to discoveries about important regulatory domains such as transcription factor and DNA binding sites as well as conserved protein domains. In order to help assess motif detection algorithms on motifs with varying properties and levels of conservation, we have developed a computational tool, rMotifGen, with the sole purpose of generating a number of random DNA or protein sequences containing short sequence motifs. Each motif consensus can be user-defined, randomly generated, or created from a position-specific scoring matrix (PSSM). Insertions and mutations within these motifs are created according to user-defined parameters and substitution matrices. The resulting sequences can be helpful in mutational simulations and in testing the limits of motif detection algorithms. RESULTS: Two implementations of rMotifGen have been created, one providing a graphical user interface (GUI) for random motif construction, and the other serving as a command line interface. The second implementation has the added advantages of platform independence and being able to be called in a batch mode. rMotifGen was used to construct sample sets of sequences containing DNA motifs and amino acid motifs that were then tested against the Gibbs sampler and MEME packages. CONCLUSION: rMotifGen provides an efficient and convenient method for creating random DNA or amino acid sequences with a variable number of motifs, where the instance of each motif can be incorporated using a position-specific scoring matrix (PSSM) or by creating an instance mutated from its corresponding consensus using an evolutionary model based on substitution matrices. rMotifGen is freely available at: . BioMed Central 2007-08-07 /pmc/articles/PMC1963340/ /pubmed/17683637 http://dx.doi.org/10.1186/1471-2105-8-292 Text en Copyright © 2007 Rouchka and Hardin; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software
Rouchka, Eric C
Hardin, C Timothy
rMotifGen: random motif generator for DNA and protein sequences
title rMotifGen: random motif generator for DNA and protein sequences
title_full rMotifGen: random motif generator for DNA and protein sequences
title_fullStr rMotifGen: random motif generator for DNA and protein sequences
title_full_unstemmed rMotifGen: random motif generator for DNA and protein sequences
title_short rMotifGen: random motif generator for DNA and protein sequences
title_sort rmotifgen: random motif generator for dna and protein sequences
topic Software
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1963340/
https://www.ncbi.nlm.nih.gov/pubmed/17683637
http://dx.doi.org/10.1186/1471-2105-8-292
work_keys_str_mv AT rouchkaericc rmotifgenrandommotifgeneratorfordnaandproteinsequences
AT hardinctimothy rmotifgenrandommotifgeneratorfordnaandproteinsequences