Cargando…

STEME: efficient EM to find motifs in large data sets

MEME and many other popular motif finders use the expectation–maximization (EM) algorithm to optimize their parameters. Unfortunately, the running time of EM is linear in the length of the input sequences. This can prohibit its application to data sets of the size commonly generated by high-throughp...

Descripción completa

Detalles Bibliográficos
Autores principales:	Reid, John E., Wernisch, Lorenz
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2011
Materias:	Methods Online
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3185442/ https://www.ncbi.nlm.nih.gov/pubmed/21785132 http://dx.doi.org/10.1093/nar/gkr574

_version_	1782213215108726784
author	Reid, John E. Wernisch, Lorenz
author_facet	Reid, John E. Wernisch, Lorenz
author_sort	Reid, John E.
collection	PubMed
description	MEME and many other popular motif finders use the expectation–maximization (EM) algorithm to optimize their parameters. Unfortunately, the running time of EM is linear in the length of the input sequences. This can prohibit its application to data sets of the size commonly generated by high-throughput biological techniques. A suffix tree is a data structure that can efficiently index a set of sequences. We describe an algorithm, Suffix Tree EM for Motif Elicitation (STEME), that approximates EM using suffix trees. To the best of our knowledge, this is the first application of suffix trees to EM. We provide an analysis of the expected running time of the algorithm and demonstrate that STEME runs an order of magnitude more quickly than the implementation of EM used by MEME. We give theoretical bounds for the quality of the approximation and show that, in practice, the approximation has a negligible effect on the outcome. We provide an open source implementation of the algorithm that we hope will be used to speed up existing and future motif search algorithms.
format	Online Article Text
id	pubmed-3185442
institution	National Center for Biotechnology Information
language	English
publishDate	2011
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-31854422011-10-04 STEME: efficient EM to find motifs in large data sets Reid, John E. Wernisch, Lorenz Nucleic Acids Res Methods Online MEME and many other popular motif finders use the expectation–maximization (EM) algorithm to optimize their parameters. Unfortunately, the running time of EM is linear in the length of the input sequences. This can prohibit its application to data sets of the size commonly generated by high-throughput biological techniques. A suffix tree is a data structure that can efficiently index a set of sequences. We describe an algorithm, Suffix Tree EM for Motif Elicitation (STEME), that approximates EM using suffix trees. To the best of our knowledge, this is the first application of suffix trees to EM. We provide an analysis of the expected running time of the algorithm and demonstrate that STEME runs an order of magnitude more quickly than the implementation of EM used by MEME. We give theoretical bounds for the quality of the approximation and show that, in practice, the approximation has a negligible effect on the outcome. We provide an open source implementation of the algorithm that we hope will be used to speed up existing and future motif search algorithms. Oxford University Press 2011-10 2011-07-23 /pmc/articles/PMC3185442/ /pubmed/21785132 http://dx.doi.org/10.1093/nar/gkr574 Text en © The Author(s) 2011. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/3.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0), which permits unrestricted non-commercial use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methods Online Reid, John E. Wernisch, Lorenz STEME: efficient EM to find motifs in large data sets
title	STEME: efficient EM to find motifs in large data sets
title_full	STEME: efficient EM to find motifs in large data sets
title_fullStr	STEME: efficient EM to find motifs in large data sets
title_full_unstemmed	STEME: efficient EM to find motifs in large data sets
title_short	STEME: efficient EM to find motifs in large data sets
title_sort	steme: efficient em to find motifs in large data sets
topic	Methods Online
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3185442/ https://www.ncbi.nlm.nih.gov/pubmed/21785132 http://dx.doi.org/10.1093/nar/gkr574
work_keys_str_mv	AT reidjohne stemeefficientemtofindmotifsinlargedatasets AT wernischlorenz stemeefficientemtofindmotifsinlargedatasets

STEME: efficient EM to find motifs in large data sets

Ejemplares similares