Cargando…

Regmex: a statistical tool for exploring motifs in ranked sequence lists from genomics experiments

BACKGROUND: Motif analysis methods have long been central for studying biological function of nucleotide sequences. Functional genomics experiments extend their potential. They typically generate sequence lists ranked by an experimentally acquired functional property such as gene expression or prote...

Descripción completa

Detalles Bibliográficos
Autores principales: Nielsen, Morten Muhlig, Tataru, Paula, Madsen, Tobias, Hobolth, Asger, Pedersen, Jakob Skou
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6286601/
https://www.ncbi.nlm.nih.gov/pubmed/30555524
http://dx.doi.org/10.1186/s13015-018-0135-2
_version_ 1783379491449470976
author Nielsen, Morten Muhlig
Tataru, Paula
Madsen, Tobias
Hobolth, Asger
Pedersen, Jakob Skou
author_facet Nielsen, Morten Muhlig
Tataru, Paula
Madsen, Tobias
Hobolth, Asger
Pedersen, Jakob Skou
author_sort Nielsen, Morten Muhlig
collection PubMed
description BACKGROUND: Motif analysis methods have long been central for studying biological function of nucleotide sequences. Functional genomics experiments extend their potential. They typically generate sequence lists ranked by an experimentally acquired functional property such as gene expression or protein binding affinity. Current motif discovery tools suffer from limitations in searching large motif spaces, and thus more complex motifs may not be included. There is thus a need for motif analysis methods that are tailored for analyzing specific complex motifs motivated by biological questions and hypotheses rather than acting as a screen based motif finding tool. METHODS: We present Regmex (REGular expression Motif EXplorer), which offers several methods to identify overrepresented motifs in ranked lists of sequences. Regmex uses regular expressions to define motifs or families of motifs and embedded Markov models to calculate exact p-values for motif observations in sequences. Biases in motif distributions across ranked sequence lists are evaluated using random walks, Brownian bridges, or modified rank based statistics. A modular setup and fast analytic p value evaluations make Regmex applicable to diverse and potentially large-scale motif analysis problems. RESULTS: We demonstrate use cases of combined motifs on simulated data and on expression data from micro RNA transfection experiments. We confirm previously obtained results and demonstrate the usability of Regmex to test a specific hypothesis about the relative location of microRNA seed sites and U-rich motifs. We further compare the tool with an existing motif discovery tool and show increased sensitivity. CONCLUSIONS: Regmex is a useful and flexible tool to analyze motif hypotheses that relates to large data sets in functional genomics. The method is available as an R package (https://github.com/muhligs/regmex). ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13015-018-0135-2) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6286601
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-62866012018-12-14 Regmex: a statistical tool for exploring motifs in ranked sequence lists from genomics experiments Nielsen, Morten Muhlig Tataru, Paula Madsen, Tobias Hobolth, Asger Pedersen, Jakob Skou Algorithms Mol Biol Software Article BACKGROUND: Motif analysis methods have long been central for studying biological function of nucleotide sequences. Functional genomics experiments extend their potential. They typically generate sequence lists ranked by an experimentally acquired functional property such as gene expression or protein binding affinity. Current motif discovery tools suffer from limitations in searching large motif spaces, and thus more complex motifs may not be included. There is thus a need for motif analysis methods that are tailored for analyzing specific complex motifs motivated by biological questions and hypotheses rather than acting as a screen based motif finding tool. METHODS: We present Regmex (REGular expression Motif EXplorer), which offers several methods to identify overrepresented motifs in ranked lists of sequences. Regmex uses regular expressions to define motifs or families of motifs and embedded Markov models to calculate exact p-values for motif observations in sequences. Biases in motif distributions across ranked sequence lists are evaluated using random walks, Brownian bridges, or modified rank based statistics. A modular setup and fast analytic p value evaluations make Regmex applicable to diverse and potentially large-scale motif analysis problems. RESULTS: We demonstrate use cases of combined motifs on simulated data and on expression data from micro RNA transfection experiments. We confirm previously obtained results and demonstrate the usability of Regmex to test a specific hypothesis about the relative location of microRNA seed sites and U-rich motifs. We further compare the tool with an existing motif discovery tool and show increased sensitivity. CONCLUSIONS: Regmex is a useful and flexible tool to analyze motif hypotheses that relates to large data sets in functional genomics. The method is available as an R package (https://github.com/muhligs/regmex). ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s13015-018-0135-2) contains supplementary material, which is available to authorized users. BioMed Central 2018-12-08 /pmc/articles/PMC6286601/ /pubmed/30555524 http://dx.doi.org/10.1186/s13015-018-0135-2 Text en © The Author(s) 2018 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Software Article
Nielsen, Morten Muhlig
Tataru, Paula
Madsen, Tobias
Hobolth, Asger
Pedersen, Jakob Skou
Regmex: a statistical tool for exploring motifs in ranked sequence lists from genomics experiments
title Regmex: a statistical tool for exploring motifs in ranked sequence lists from genomics experiments
title_full Regmex: a statistical tool for exploring motifs in ranked sequence lists from genomics experiments
title_fullStr Regmex: a statistical tool for exploring motifs in ranked sequence lists from genomics experiments
title_full_unstemmed Regmex: a statistical tool for exploring motifs in ranked sequence lists from genomics experiments
title_short Regmex: a statistical tool for exploring motifs in ranked sequence lists from genomics experiments
title_sort regmex: a statistical tool for exploring motifs in ranked sequence lists from genomics experiments
topic Software Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6286601/
https://www.ncbi.nlm.nih.gov/pubmed/30555524
http://dx.doi.org/10.1186/s13015-018-0135-2
work_keys_str_mv AT nielsenmortenmuhlig regmexastatisticaltoolforexploringmotifsinrankedsequencelistsfromgenomicsexperiments
AT tatarupaula regmexastatisticaltoolforexploringmotifsinrankedsequencelistsfromgenomicsexperiments
AT madsentobias regmexastatisticaltoolforexploringmotifsinrankedsequencelistsfromgenomicsexperiments
AT hobolthasger regmexastatisticaltoolforexploringmotifsinrankedsequencelistsfromgenomicsexperiments
AT pedersenjakobskou regmexastatisticaltoolforexploringmotifsinrankedsequencelistsfromgenomicsexperiments