Cargando…

On the comparison of regulatory sequences with multiple resolution Entropic Profiles

BACKGROUND: Enhancers are stretches of DNA (100–1000 bp) that play a major role in development gene expression, evolution and disease. It has been recently shown that in high-level eukaryotes enhancers rarely work alone, instead they collaborate by forming clusters of cis-regulatory modules (CRMs)....

Descripción completa

Detalles Bibliográficos
Autores principales: Comin, Matteo, Antonello, Morris
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4797186/
https://www.ncbi.nlm.nih.gov/pubmed/26987840
http://dx.doi.org/10.1186/s12859-016-0980-2
_version_ 1782421905426350080
author Comin, Matteo
Antonello, Morris
author_facet Comin, Matteo
Antonello, Morris
author_sort Comin, Matteo
collection PubMed
description BACKGROUND: Enhancers are stretches of DNA (100–1000 bp) that play a major role in development gene expression, evolution and disease. It has been recently shown that in high-level eukaryotes enhancers rarely work alone, instead they collaborate by forming clusters of cis-regulatory modules (CRMs). Although the binding of transcription factors is sequence-specific, the identification of functionally similar enhancers is very difficult and it cannot be carried out with traditional alignment-based techniques. RESULTS: The use of fast similarity measures, like alignment-free measures, to detect related regulatory sequences is crucial to understand functional correlation between two enhancers. In this paper we study the use of alignment-free measures for the classification of CRMs. However, alignment-free measures are generally tied to a fixed resolution k. Here we propose an alignment-free statistic, called [Formula: see text] , that is based on multiple resolution patterns derived from the Entropic Profiles (EPs). The Entropic Profile is a function of the genomic location that captures the importance of that region with respect to the whole genome. As a byproduct we provide a formula to compute the exact variance of variable length word counts, a result that can be of general interest also in other applications. CONCLUSIONS: We evaluate several alignment-free statistics on simulated data and real mouse ChIP-seq sequences. The new statistic, [Formula: see text] , is highly successful in discriminating functionally related enhancers and, in almost all experiments, it outperforms fixed-resolution methods. We implemented the new alignment-free measures, as well as traditional ones, in a software called EP-sim that is freely available: http://www.dei.unipd.it/~ciompin/main/EP-sim.html.
format Online
Article
Text
id pubmed-4797186
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-47971862016-03-18 On the comparison of regulatory sequences with multiple resolution Entropic Profiles Comin, Matteo Antonello, Morris BMC Bioinformatics Research Article BACKGROUND: Enhancers are stretches of DNA (100–1000 bp) that play a major role in development gene expression, evolution and disease. It has been recently shown that in high-level eukaryotes enhancers rarely work alone, instead they collaborate by forming clusters of cis-regulatory modules (CRMs). Although the binding of transcription factors is sequence-specific, the identification of functionally similar enhancers is very difficult and it cannot be carried out with traditional alignment-based techniques. RESULTS: The use of fast similarity measures, like alignment-free measures, to detect related regulatory sequences is crucial to understand functional correlation between two enhancers. In this paper we study the use of alignment-free measures for the classification of CRMs. However, alignment-free measures are generally tied to a fixed resolution k. Here we propose an alignment-free statistic, called [Formula: see text] , that is based on multiple resolution patterns derived from the Entropic Profiles (EPs). The Entropic Profile is a function of the genomic location that captures the importance of that region with respect to the whole genome. As a byproduct we provide a formula to compute the exact variance of variable length word counts, a result that can be of general interest also in other applications. CONCLUSIONS: We evaluate several alignment-free statistics on simulated data and real mouse ChIP-seq sequences. The new statistic, [Formula: see text] , is highly successful in discriminating functionally related enhancers and, in almost all experiments, it outperforms fixed-resolution methods. We implemented the new alignment-free measures, as well as traditional ones, in a software called EP-sim that is freely available: http://www.dei.unipd.it/~ciompin/main/EP-sim.html. BioMed Central 2016-03-18 /pmc/articles/PMC4797186/ /pubmed/26987840 http://dx.doi.org/10.1186/s12859-016-0980-2 Text en © Comin and Antonello. 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Comin, Matteo
Antonello, Morris
On the comparison of regulatory sequences with multiple resolution Entropic Profiles
title On the comparison of regulatory sequences with multiple resolution Entropic Profiles
title_full On the comparison of regulatory sequences with multiple resolution Entropic Profiles
title_fullStr On the comparison of regulatory sequences with multiple resolution Entropic Profiles
title_full_unstemmed On the comparison of regulatory sequences with multiple resolution Entropic Profiles
title_short On the comparison of regulatory sequences with multiple resolution Entropic Profiles
title_sort on the comparison of regulatory sequences with multiple resolution entropic profiles
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4797186/
https://www.ncbi.nlm.nih.gov/pubmed/26987840
http://dx.doi.org/10.1186/s12859-016-0980-2
work_keys_str_mv AT cominmatteo onthecomparisonofregulatorysequenceswithmultipleresolutionentropicprofiles
AT antonellomorris onthecomparisonofregulatorysequenceswithmultipleresolutionentropicprofiles