Cargando…

Evaluating deterministic motif significance measures in protein databases

BACKGROUND: Assessing the outcome of motif mining algorithms is an essential task, as the number of reported motifs can be very large. Significance measures play a central role in automatically ranking those motifs, and therefore alleviating the analysis work. Spotting the most interesting and relev...

Descripción completa

Detalles Bibliográficos
Autores principales: Ferreira, Pedro Gabriel, Azevedo, Paulo J
Formato: Texto
Lenguaje:English
Publicado: BioMed Central 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2254621/
https://www.ncbi.nlm.nih.gov/pubmed/18157916
http://dx.doi.org/10.1186/1748-7188-2-16
_version_ 1782151207427506176
author Ferreira, Pedro Gabriel
Azevedo, Paulo J
author_facet Ferreira, Pedro Gabriel
Azevedo, Paulo J
author_sort Ferreira, Pedro Gabriel
collection PubMed
description BACKGROUND: Assessing the outcome of motif mining algorithms is an essential task, as the number of reported motifs can be very large. Significance measures play a central role in automatically ranking those motifs, and therefore alleviating the analysis work. Spotting the most interesting and relevant motifs is then dependent on the choice of the right measures. The combined use of several measures may provide more robust results. However caution has to be taken in order to avoid spurious evaluations. RESULTS: From the set of conducted experiments, it was verified that several of the selected significance measures show a very similar behavior in a wide range of situations therefore providing redundant information. Some measures have proved to be more appropriate to rank highly conserved motifs, while others are more appropriate for weakly conserved ones. Support appears as a very important feature to be considered for correct motif ranking. We observed that not all the measures are suitable for situations with poorly balanced class information, like for instance, when positive data is significantly less than negative data. Finally, a visualization scheme was proposed that, when several measures are applied, enables an easy identification of high scoring motifs. CONCLUSION: In this work we have surveyed and categorized 14 significance measures for pattern evaluation. Their ability to rank three types of deterministic motifs was evaluated. Measures were applied in different testing conditions, where relations were identified. This study provides some pertinent insights on the choice of the right set of significance measures for the evaluation of deterministic motifs extracted from protein databases.
format Text
id pubmed-2254621
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-22546212008-02-27 Evaluating deterministic motif significance measures in protein databases Ferreira, Pedro Gabriel Azevedo, Paulo J Algorithms Mol Biol Research BACKGROUND: Assessing the outcome of motif mining algorithms is an essential task, as the number of reported motifs can be very large. Significance measures play a central role in automatically ranking those motifs, and therefore alleviating the analysis work. Spotting the most interesting and relevant motifs is then dependent on the choice of the right measures. The combined use of several measures may provide more robust results. However caution has to be taken in order to avoid spurious evaluations. RESULTS: From the set of conducted experiments, it was verified that several of the selected significance measures show a very similar behavior in a wide range of situations therefore providing redundant information. Some measures have proved to be more appropriate to rank highly conserved motifs, while others are more appropriate for weakly conserved ones. Support appears as a very important feature to be considered for correct motif ranking. We observed that not all the measures are suitable for situations with poorly balanced class information, like for instance, when positive data is significantly less than negative data. Finally, a visualization scheme was proposed that, when several measures are applied, enables an easy identification of high scoring motifs. CONCLUSION: In this work we have surveyed and categorized 14 significance measures for pattern evaluation. Their ability to rank three types of deterministic motifs was evaluated. Measures were applied in different testing conditions, where relations were identified. This study provides some pertinent insights on the choice of the right set of significance measures for the evaluation of deterministic motifs extracted from protein databases. BioMed Central 2007-12-24 /pmc/articles/PMC2254621/ /pubmed/18157916 http://dx.doi.org/10.1186/1748-7188-2-16 Text en Copyright © 2007 Ferreira and Azevedo; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Ferreira, Pedro Gabriel
Azevedo, Paulo J
Evaluating deterministic motif significance measures in protein databases
title Evaluating deterministic motif significance measures in protein databases
title_full Evaluating deterministic motif significance measures in protein databases
title_fullStr Evaluating deterministic motif significance measures in protein databases
title_full_unstemmed Evaluating deterministic motif significance measures in protein databases
title_short Evaluating deterministic motif significance measures in protein databases
title_sort evaluating deterministic motif significance measures in protein databases
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2254621/
https://www.ncbi.nlm.nih.gov/pubmed/18157916
http://dx.doi.org/10.1186/1748-7188-2-16
work_keys_str_mv AT ferreirapedrogabriel evaluatingdeterministicmotifsignificancemeasuresinproteindatabases
AT azevedopauloj evaluatingdeterministicmotifsignificancemeasuresinproteindatabases