Cargando…
On the comparison of regulatory sequences with multiple resolution Entropic Profiles
BACKGROUND: Enhancers are stretches of DNA (100–1000 bp) that play a major role in development gene expression, evolution and disease. It has been recently shown that in high-level eukaryotes enhancers rarely work alone, instead they collaborate by forming clusters of cis-regulatory modules (CRMs)....
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2016
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4797186/ https://www.ncbi.nlm.nih.gov/pubmed/26987840 http://dx.doi.org/10.1186/s12859-016-0980-2 |
_version_ | 1782421905426350080 |
---|---|
author | Comin, Matteo Antonello, Morris |
author_facet | Comin, Matteo Antonello, Morris |
author_sort | Comin, Matteo |
collection | PubMed |
description | BACKGROUND: Enhancers are stretches of DNA (100–1000 bp) that play a major role in development gene expression, evolution and disease. It has been recently shown that in high-level eukaryotes enhancers rarely work alone, instead they collaborate by forming clusters of cis-regulatory modules (CRMs). Although the binding of transcription factors is sequence-specific, the identification of functionally similar enhancers is very difficult and it cannot be carried out with traditional alignment-based techniques. RESULTS: The use of fast similarity measures, like alignment-free measures, to detect related regulatory sequences is crucial to understand functional correlation between two enhancers. In this paper we study the use of alignment-free measures for the classification of CRMs. However, alignment-free measures are generally tied to a fixed resolution k. Here we propose an alignment-free statistic, called [Formula: see text] , that is based on multiple resolution patterns derived from the Entropic Profiles (EPs). The Entropic Profile is a function of the genomic location that captures the importance of that region with respect to the whole genome. As a byproduct we provide a formula to compute the exact variance of variable length word counts, a result that can be of general interest also in other applications. CONCLUSIONS: We evaluate several alignment-free statistics on simulated data and real mouse ChIP-seq sequences. The new statistic, [Formula: see text] , is highly successful in discriminating functionally related enhancers and, in almost all experiments, it outperforms fixed-resolution methods. We implemented the new alignment-free measures, as well as traditional ones, in a software called EP-sim that is freely available: http://www.dei.unipd.it/~ciompin/main/EP-sim.html. |
format | Online Article Text |
id | pubmed-4797186 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2016 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-47971862016-03-18 On the comparison of regulatory sequences with multiple resolution Entropic Profiles Comin, Matteo Antonello, Morris BMC Bioinformatics Research Article BACKGROUND: Enhancers are stretches of DNA (100–1000 bp) that play a major role in development gene expression, evolution and disease. It has been recently shown that in high-level eukaryotes enhancers rarely work alone, instead they collaborate by forming clusters of cis-regulatory modules (CRMs). Although the binding of transcription factors is sequence-specific, the identification of functionally similar enhancers is very difficult and it cannot be carried out with traditional alignment-based techniques. RESULTS: The use of fast similarity measures, like alignment-free measures, to detect related regulatory sequences is crucial to understand functional correlation between two enhancers. In this paper we study the use of alignment-free measures for the classification of CRMs. However, alignment-free measures are generally tied to a fixed resolution k. Here we propose an alignment-free statistic, called [Formula: see text] , that is based on multiple resolution patterns derived from the Entropic Profiles (EPs). The Entropic Profile is a function of the genomic location that captures the importance of that region with respect to the whole genome. As a byproduct we provide a formula to compute the exact variance of variable length word counts, a result that can be of general interest also in other applications. CONCLUSIONS: We evaluate several alignment-free statistics on simulated data and real mouse ChIP-seq sequences. The new statistic, [Formula: see text] , is highly successful in discriminating functionally related enhancers and, in almost all experiments, it outperforms fixed-resolution methods. We implemented the new alignment-free measures, as well as traditional ones, in a software called EP-sim that is freely available: http://www.dei.unipd.it/~ciompin/main/EP-sim.html. BioMed Central 2016-03-18 /pmc/articles/PMC4797186/ /pubmed/26987840 http://dx.doi.org/10.1186/s12859-016-0980-2 Text en © Comin and Antonello. 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Article Comin, Matteo Antonello, Morris On the comparison of regulatory sequences with multiple resolution Entropic Profiles |
title | On the comparison of regulatory sequences with multiple resolution Entropic Profiles |
title_full | On the comparison of regulatory sequences with multiple resolution Entropic Profiles |
title_fullStr | On the comparison of regulatory sequences with multiple resolution Entropic Profiles |
title_full_unstemmed | On the comparison of regulatory sequences with multiple resolution Entropic Profiles |
title_short | On the comparison of regulatory sequences with multiple resolution Entropic Profiles |
title_sort | on the comparison of regulatory sequences with multiple resolution entropic profiles |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4797186/ https://www.ncbi.nlm.nih.gov/pubmed/26987840 http://dx.doi.org/10.1186/s12859-016-0980-2 |
work_keys_str_mv | AT cominmatteo onthecomparisonofregulatorysequenceswithmultipleresolutionentropicprofiles AT antonellomorris onthecomparisonofregulatorysequenceswithmultipleresolutionentropicprofiles |