Cargando…

MCOIN: a novel heuristic for determining transcription factor binding site motif width

BACKGROUND: In transcription factor binding site discovery, the true width of the motif to be discovered is generally not known a priori. The ability to compute the most likely width of a motif is therefore a highly desirable property for motif discovery algorithms. However, this is a challenging co...

Descripción completa

Detalles Bibliográficos
Autores principales: Kilpatrick, Alastair M, Ward, Bruce, Aitken, Stuart
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3716798/
https://www.ncbi.nlm.nih.gov/pubmed/23806098
http://dx.doi.org/10.1186/1748-7188-8-16
_version_ 1782277600774717440
author Kilpatrick, Alastair M
Ward, Bruce
Aitken, Stuart
author_facet Kilpatrick, Alastair M
Ward, Bruce
Aitken, Stuart
author_sort Kilpatrick, Alastair M
collection PubMed
description BACKGROUND: In transcription factor binding site discovery, the true width of the motif to be discovered is generally not known a priori. The ability to compute the most likely width of a motif is therefore a highly desirable property for motif discovery algorithms. However, this is a challenging computational problem as a result of changing model dimensionality at changing motif widths. The complexity of the problem is increased as the discovered model at the true motif width need not be the most statistically significant in a set of candidate motif models. Further, the core motif discovery algorithm used cannot guarantee to return the best possible result at each candidate width. RESULTS: We present MCOIN, a novel heuristic for automatically determining transcription factor binding site motif width, based on motif containment and information content. Using realistic synthetic data and previously characterised prokaryotic data, we show that MCOIN outperforms the current most popular method (E-value of the resulting multiple alignment) as a predictor of motif width, based on mean absolute error. MCOIN is also shown to choose models which better match known sites at higher levels of motif conservation, based on ROC analysis. CONCLUSIONS: We demonstrate the performance of MCOIN as part of a deterministic motif discovery algorithm and conclude that MCOIN outperforms current methods for determining motif width.
format Online
Article
Text
id pubmed-3716798
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-37167982013-07-23 MCOIN: a novel heuristic for determining transcription factor binding site motif width Kilpatrick, Alastair M Ward, Bruce Aitken, Stuart Algorithms Mol Biol Research BACKGROUND: In transcription factor binding site discovery, the true width of the motif to be discovered is generally not known a priori. The ability to compute the most likely width of a motif is therefore a highly desirable property for motif discovery algorithms. However, this is a challenging computational problem as a result of changing model dimensionality at changing motif widths. The complexity of the problem is increased as the discovered model at the true motif width need not be the most statistically significant in a set of candidate motif models. Further, the core motif discovery algorithm used cannot guarantee to return the best possible result at each candidate width. RESULTS: We present MCOIN, a novel heuristic for automatically determining transcription factor binding site motif width, based on motif containment and information content. Using realistic synthetic data and previously characterised prokaryotic data, we show that MCOIN outperforms the current most popular method (E-value of the resulting multiple alignment) as a predictor of motif width, based on mean absolute error. MCOIN is also shown to choose models which better match known sites at higher levels of motif conservation, based on ROC analysis. CONCLUSIONS: We demonstrate the performance of MCOIN as part of a deterministic motif discovery algorithm and conclude that MCOIN outperforms current methods for determining motif width. BioMed Central 2013-06-27 /pmc/articles/PMC3716798/ /pubmed/23806098 http://dx.doi.org/10.1186/1748-7188-8-16 Text en Copyright © 2013 Kilpatrick et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Kilpatrick, Alastair M
Ward, Bruce
Aitken, Stuart
MCOIN: a novel heuristic for determining transcription factor binding site motif width
title MCOIN: a novel heuristic for determining transcription factor binding site motif width
title_full MCOIN: a novel heuristic for determining transcription factor binding site motif width
title_fullStr MCOIN: a novel heuristic for determining transcription factor binding site motif width
title_full_unstemmed MCOIN: a novel heuristic for determining transcription factor binding site motif width
title_short MCOIN: a novel heuristic for determining transcription factor binding site motif width
title_sort mcoin: a novel heuristic for determining transcription factor binding site motif width
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3716798/
https://www.ncbi.nlm.nih.gov/pubmed/23806098
http://dx.doi.org/10.1186/1748-7188-8-16
work_keys_str_mv AT kilpatrickalastairm mcoinanovelheuristicfordeterminingtranscriptionfactorbindingsitemotifwidth
AT wardbruce mcoinanovelheuristicfordeterminingtranscriptionfactorbindingsitemotifwidth
AT aitkenstuart mcoinanovelheuristicfordeterminingtranscriptionfactorbindingsitemotifwidth