Cargando…
Mutual enrichment in ranked lists and the statistical assessment of position weight matrix motifs
BACKGROUND: Statistics in ranked lists is useful in analysing molecular biology measurement data, such as differential expression, resulting in ranked lists of genes, or ChIP-Seq, which yields ranked lists of genomic sequences. State of the art methods study fixed motifs in ranked lists of sequences...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2014
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4021615/ https://www.ncbi.nlm.nih.gov/pubmed/24708618 http://dx.doi.org/10.1186/1748-7188-9-11 |
_version_ | 1782316268356894720 |
---|---|
author | Leibovich, Limor Yakhini, Zohar |
author_facet | Leibovich, Limor Yakhini, Zohar |
author_sort | Leibovich, Limor |
collection | PubMed |
description | BACKGROUND: Statistics in ranked lists is useful in analysing molecular biology measurement data, such as differential expression, resulting in ranked lists of genes, or ChIP-Seq, which yields ranked lists of genomic sequences. State of the art methods study fixed motifs in ranked lists of sequences. More flexible models such as position weight matrix (PWM) motifs are more challenging in this context, partially because it is not clear how to avoid the use of arbitrary thresholds. RESULTS: To assess the enrichment of a PWM motif in a ranked list we use a second ranking on the same set of elements induced by the PWM. Possible orders of one ranked list relative to another can be modelled as permutations. Due to sample space complexity, it is difficult to accurately characterize tail distributions in the group of permutations. In this paper we develop tight upper bounds on tail distributions of the size of the intersection of the top parts of two uniformly and independently drawn permutations. We further demonstrate advantages of this approach using our software implementation, mmHG-Finder, which is publicly available, to study PWM motifs in several datasets. In addition to validating known motifs, we found GC-rich strings to be enriched amongst the promoter sequences of long non-coding RNAs that are specifically expressed in thyroid and prostate tissue samples and observed a statistical association with tissue specific CpG hypo-methylation. CONCLUSIONS: We develop tight bounds that can be calculated in polynomial time. We demonstrate utility of mutual enrichment in motif search and assess performance for synthetic and biological datasets. We suggest that thyroid and prostate-specific long non-coding RNAs are regulated by transcription factors that bind GC-rich sequences, such as EGR1, SP1 and E2F3. We further suggest that this regulation is associated with DNA hypo-methylation. |
format | Online Article Text |
id | pubmed-4021615 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-40216152014-05-28 Mutual enrichment in ranked lists and the statistical assessment of position weight matrix motifs Leibovich, Limor Yakhini, Zohar Algorithms Mol Biol Research BACKGROUND: Statistics in ranked lists is useful in analysing molecular biology measurement data, such as differential expression, resulting in ranked lists of genes, or ChIP-Seq, which yields ranked lists of genomic sequences. State of the art methods study fixed motifs in ranked lists of sequences. More flexible models such as position weight matrix (PWM) motifs are more challenging in this context, partially because it is not clear how to avoid the use of arbitrary thresholds. RESULTS: To assess the enrichment of a PWM motif in a ranked list we use a second ranking on the same set of elements induced by the PWM. Possible orders of one ranked list relative to another can be modelled as permutations. Due to sample space complexity, it is difficult to accurately characterize tail distributions in the group of permutations. In this paper we develop tight upper bounds on tail distributions of the size of the intersection of the top parts of two uniformly and independently drawn permutations. We further demonstrate advantages of this approach using our software implementation, mmHG-Finder, which is publicly available, to study PWM motifs in several datasets. In addition to validating known motifs, we found GC-rich strings to be enriched amongst the promoter sequences of long non-coding RNAs that are specifically expressed in thyroid and prostate tissue samples and observed a statistical association with tissue specific CpG hypo-methylation. CONCLUSIONS: We develop tight bounds that can be calculated in polynomial time. We demonstrate utility of mutual enrichment in motif search and assess performance for synthetic and biological datasets. We suggest that thyroid and prostate-specific long non-coding RNAs are regulated by transcription factors that bind GC-rich sequences, such as EGR1, SP1 and E2F3. We further suggest that this regulation is associated with DNA hypo-methylation. BioMed Central 2014-04-05 /pmc/articles/PMC4021615/ /pubmed/24708618 http://dx.doi.org/10.1186/1748-7188-9-11 Text en Copyright © 2014 Leibovich and Yakhini; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Leibovich, Limor Yakhini, Zohar Mutual enrichment in ranked lists and the statistical assessment of position weight matrix motifs |
title | Mutual enrichment in ranked lists and the statistical assessment of position weight matrix motifs |
title_full | Mutual enrichment in ranked lists and the statistical assessment of position weight matrix motifs |
title_fullStr | Mutual enrichment in ranked lists and the statistical assessment of position weight matrix motifs |
title_full_unstemmed | Mutual enrichment in ranked lists and the statistical assessment of position weight matrix motifs |
title_short | Mutual enrichment in ranked lists and the statistical assessment of position weight matrix motifs |
title_sort | mutual enrichment in ranked lists and the statistical assessment of position weight matrix motifs |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4021615/ https://www.ncbi.nlm.nih.gov/pubmed/24708618 http://dx.doi.org/10.1186/1748-7188-9-11 |
work_keys_str_mv | AT leibovichlimor mutualenrichmentinrankedlistsandthestatisticalassessmentofpositionweightmatrixmotifs AT yakhinizohar mutualenrichmentinrankedlistsandthestatisticalassessmentofpositionweightmatrixmotifs |