Cargando…
STEME: A Robust, Accurate Motif Finder for Large Data Sets
Motif finding is a difficult problem that has been studied for over 20 years. Some older popular motif finders are not suitable for analysis of the large data sets generated by next-generation sequencing. We recently published an efficient approximation (STEME) to the EM algorithm that is at the cor...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Public Library of Science
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3953122/ https://www.ncbi.nlm.nih.gov/pubmed/24625410 http://dx.doi.org/10.1371/journal.pone.0090735 |
_version_ | 1782307307182358528 |
---|---|
author | Reid, John E. Wernisch, Lorenz |
author_facet | Reid, John E. Wernisch, Lorenz |
author_sort | Reid, John E. |
collection | PubMed |
description | Motif finding is a difficult problem that has been studied for over 20 years. Some older popular motif finders are not suitable for analysis of the large data sets generated by next-generation sequencing. We recently published an efficient approximation (STEME) to the EM algorithm that is at the core of many motif finders such as MEME. This approximation allows the EM algorithm to be applied to large data sets. In this work we describe several efficient extensions to STEME that are based on the MEME algorithm. Together with the original STEME EM approximation, these extensions make STEME a fully-fledged motif finder with similar properties to MEME. We discuss the difficulty of objectively comparing motif finders. We show that STEME performs comparably to existing prominent discriminative motif finders, DREME and Trawler, on 13 sets of transcription factor binding data in mouse ES cells. We demonstrate the ability of STEME to find long degenerate motifs which these discriminative motif finders do not find. As part of our method, we extend an earlier method due to Nagarajan et al. for the efficient calculation of motif E-values. STEME's source code is available under an open source license and STEME is available via a web interface. |
format | Online Article Text |
id | pubmed-3953122 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | Public Library of Science |
record_format | MEDLINE/PubMed |
spelling | pubmed-39531222014-03-18 STEME: A Robust, Accurate Motif Finder for Large Data Sets Reid, John E. Wernisch, Lorenz PLoS One Research Article Motif finding is a difficult problem that has been studied for over 20 years. Some older popular motif finders are not suitable for analysis of the large data sets generated by next-generation sequencing. We recently published an efficient approximation (STEME) to the EM algorithm that is at the core of many motif finders such as MEME. This approximation allows the EM algorithm to be applied to large data sets. In this work we describe several efficient extensions to STEME that are based on the MEME algorithm. Together with the original STEME EM approximation, these extensions make STEME a fully-fledged motif finder with similar properties to MEME. We discuss the difficulty of objectively comparing motif finders. We show that STEME performs comparably to existing prominent discriminative motif finders, DREME and Trawler, on 13 sets of transcription factor binding data in mouse ES cells. We demonstrate the ability of STEME to find long degenerate motifs which these discriminative motif finders do not find. As part of our method, we extend an earlier method due to Nagarajan et al. for the efficient calculation of motif E-values. STEME's source code is available under an open source license and STEME is available via a web interface. Public Library of Science 2014-03-13 /pmc/articles/PMC3953122/ /pubmed/24625410 http://dx.doi.org/10.1371/journal.pone.0090735 Text en © 2014 Reid, Wernisch http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited. |
spellingShingle | Research Article Reid, John E. Wernisch, Lorenz STEME: A Robust, Accurate Motif Finder for Large Data Sets |
title | STEME: A Robust, Accurate Motif Finder for Large Data Sets |
title_full | STEME: A Robust, Accurate Motif Finder for Large Data Sets |
title_fullStr | STEME: A Robust, Accurate Motif Finder for Large Data Sets |
title_full_unstemmed | STEME: A Robust, Accurate Motif Finder for Large Data Sets |
title_short | STEME: A Robust, Accurate Motif Finder for Large Data Sets |
title_sort | steme: a robust, accurate motif finder for large data sets |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3953122/ https://www.ncbi.nlm.nih.gov/pubmed/24625410 http://dx.doi.org/10.1371/journal.pone.0090735 |
work_keys_str_mv | AT reidjohne stemearobustaccuratemotiffinderforlargedatasets AT wernischlorenz stemearobustaccuratemotiffinderforlargedatasets |