Cargando…

STEME: A Robust, Accurate Motif Finder for Large Data Sets

Motif finding is a difficult problem that has been studied for over 20 years. Some older popular motif finders are not suitable for analysis of the large data sets generated by next-generation sequencing. We recently published an efficient approximation (STEME) to the EM algorithm that is at the cor...

Descripción completa

Detalles Bibliográficos
Autores principales: Reid, John E., Wernisch, Lorenz
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2014
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3953122/
https://www.ncbi.nlm.nih.gov/pubmed/24625410
http://dx.doi.org/10.1371/journal.pone.0090735
_version_ 1782307307182358528
author Reid, John E.
Wernisch, Lorenz
author_facet Reid, John E.
Wernisch, Lorenz
author_sort Reid, John E.
collection PubMed
description Motif finding is a difficult problem that has been studied for over 20 years. Some older popular motif finders are not suitable for analysis of the large data sets generated by next-generation sequencing. We recently published an efficient approximation (STEME) to the EM algorithm that is at the core of many motif finders such as MEME. This approximation allows the EM algorithm to be applied to large data sets. In this work we describe several efficient extensions to STEME that are based on the MEME algorithm. Together with the original STEME EM approximation, these extensions make STEME a fully-fledged motif finder with similar properties to MEME. We discuss the difficulty of objectively comparing motif finders. We show that STEME performs comparably to existing prominent discriminative motif finders, DREME and Trawler, on 13 sets of transcription factor binding data in mouse ES cells. We demonstrate the ability of STEME to find long degenerate motifs which these discriminative motif finders do not find. As part of our method, we extend an earlier method due to Nagarajan et al. for the efficient calculation of motif E-values. STEME's source code is available under an open source license and STEME is available via a web interface.
format Online
Article
Text
id pubmed-3953122
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-39531222014-03-18 STEME: A Robust, Accurate Motif Finder for Large Data Sets Reid, John E. Wernisch, Lorenz PLoS One Research Article Motif finding is a difficult problem that has been studied for over 20 years. Some older popular motif finders are not suitable for analysis of the large data sets generated by next-generation sequencing. We recently published an efficient approximation (STEME) to the EM algorithm that is at the core of many motif finders such as MEME. This approximation allows the EM algorithm to be applied to large data sets. In this work we describe several efficient extensions to STEME that are based on the MEME algorithm. Together with the original STEME EM approximation, these extensions make STEME a fully-fledged motif finder with similar properties to MEME. We discuss the difficulty of objectively comparing motif finders. We show that STEME performs comparably to existing prominent discriminative motif finders, DREME and Trawler, on 13 sets of transcription factor binding data in mouse ES cells. We demonstrate the ability of STEME to find long degenerate motifs which these discriminative motif finders do not find. As part of our method, we extend an earlier method due to Nagarajan et al. for the efficient calculation of motif E-values. STEME's source code is available under an open source license and STEME is available via a web interface. Public Library of Science 2014-03-13 /pmc/articles/PMC3953122/ /pubmed/24625410 http://dx.doi.org/10.1371/journal.pone.0090735 Text en © 2014 Reid, Wernisch http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Reid, John E.
Wernisch, Lorenz
STEME: A Robust, Accurate Motif Finder for Large Data Sets
title STEME: A Robust, Accurate Motif Finder for Large Data Sets
title_full STEME: A Robust, Accurate Motif Finder for Large Data Sets
title_fullStr STEME: A Robust, Accurate Motif Finder for Large Data Sets
title_full_unstemmed STEME: A Robust, Accurate Motif Finder for Large Data Sets
title_short STEME: A Robust, Accurate Motif Finder for Large Data Sets
title_sort steme: a robust, accurate motif finder for large data sets
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3953122/
https://www.ncbi.nlm.nih.gov/pubmed/24625410
http://dx.doi.org/10.1371/journal.pone.0090735
work_keys_str_mv AT reidjohne stemearobustaccuratemotiffinderforlargedatasets
AT wernischlorenz stemearobustaccuratemotiffinderforlargedatasets