Cargando…

NestedMICA as an ab initio protein motif discovery tool

BACKGROUND: Discovering overrepresented patterns in amino acid sequences is an important step in protein functional element identification. We adapted and extended NestedMICA, an ab initio motif finder originally developed for finding transcription binding site motifs, to find short protein signals,...

Descripción completa

Detalles Bibliográficos
Autores principales: Doğruel, Mutlu, Down, Thomas A, Hubbard, Tim JP
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2008
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2267705/
https://www.ncbi.nlm.nih.gov/pubmed/18194537
http://dx.doi.org/10.1186/1471-2105-9-19
_version_ 1782151646474665984
author Doğruel, Mutlu
Down, Thomas A
Hubbard, Tim JP
author_facet Doğruel, Mutlu
Down, Thomas A
Hubbard, Tim JP
author_sort Doğruel, Mutlu
collection PubMed
description BACKGROUND: Discovering overrepresented patterns in amino acid sequences is an important step in protein functional element identification. We adapted and extended NestedMICA, an ab initio motif finder originally developed for finding transcription binding site motifs, to find short protein signals, and compared its performance with another popular protein motif finder, MEME. NestedMICA, an open source protein motif discovery tool written in Java, is driven by a Monte Carlo technique called Nested Sampling. It uses multi-class sequence background models to represent different "uninteresting" parts of sequences that do not contain motifs of interest. In order to assess NestedMICA as a protein motif finder, we have tested it on synthetic datasets produced by spiking instances of known motifs into a randomly selected set of protein sequences. NestedMICA was also tested using a biologically-authentic test set, where we evaluated its performance with respect to varying sequence length. RESULTS: Generally NestedMICA recovered most of the short (3–9 amino acid long) test protein motifs spiked into a test set of sequences at different frequencies. We showed that it can be used to find multiple motifs at the same time, too. In all the assessment experiments we carried out, its overall motif discovery performance was better than that of MEME. CONCLUSION: NestedMICA proved itself to be a robust and sensitive ab initio protein motif finder, even for relatively short motifs that exist in only a small fraction of sequences. AVAILABILITY: NestedMICA is available under the Lesser GPL open-source license from:
format Text
id pubmed-2267705
institution National Center for Biotechnology Information
language English
publishDate 2008
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-22677052008-03-18 NestedMICA as an ab initio protein motif discovery tool Doğruel, Mutlu Down, Thomas A Hubbard, Tim JP BMC Bioinformatics Methodology Article BACKGROUND: Discovering overrepresented patterns in amino acid sequences is an important step in protein functional element identification. We adapted and extended NestedMICA, an ab initio motif finder originally developed for finding transcription binding site motifs, to find short protein signals, and compared its performance with another popular protein motif finder, MEME. NestedMICA, an open source protein motif discovery tool written in Java, is driven by a Monte Carlo technique called Nested Sampling. It uses multi-class sequence background models to represent different "uninteresting" parts of sequences that do not contain motifs of interest. In order to assess NestedMICA as a protein motif finder, we have tested it on synthetic datasets produced by spiking instances of known motifs into a randomly selected set of protein sequences. NestedMICA was also tested using a biologically-authentic test set, where we evaluated its performance with respect to varying sequence length. RESULTS: Generally NestedMICA recovered most of the short (3–9 amino acid long) test protein motifs spiked into a test set of sequences at different frequencies. We showed that it can be used to find multiple motifs at the same time, too. In all the assessment experiments we carried out, its overall motif discovery performance was better than that of MEME. CONCLUSION: NestedMICA proved itself to be a robust and sensitive ab initio protein motif finder, even for relatively short motifs that exist in only a small fraction of sequences. AVAILABILITY: NestedMICA is available under the Lesser GPL open-source license from: BioMed Central 2008-01-14 /pmc/articles/PMC2267705/ /pubmed/18194537 http://dx.doi.org/10.1186/1471-2105-9-19 Text en Copyright © 2008 Doğruel et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Methodology Article
Doğruel, Mutlu
Down, Thomas A
Hubbard, Tim JP
NestedMICA as an ab initio protein motif discovery tool
title NestedMICA as an ab initio protein motif discovery tool
title_full NestedMICA as an ab initio protein motif discovery tool
title_fullStr NestedMICA as an ab initio protein motif discovery tool
title_full_unstemmed NestedMICA as an ab initio protein motif discovery tool
title_short NestedMICA as an ab initio protein motif discovery tool
title_sort nestedmica as an ab initio protein motif discovery tool
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2267705/
https://www.ncbi.nlm.nih.gov/pubmed/18194537
http://dx.doi.org/10.1186/1471-2105-9-19
work_keys_str_mv AT dogruelmutlu nestedmicaasanabinitioproteinmotifdiscoverytool
AT downthomasa nestedmicaasanabinitioproteinmotifdiscoverytool
AT hubbardtimjp nestedmicaasanabinitioproteinmotifdiscoverytool