Cargando…

Uncovering correlated variability in epigenomic datasets using the Karhunen-Loeve transform

BACKGROUND: Larger variation exists in epigenomes than in genomes, as a single genome shapes the identity of multiple cell types. With the advent of next-generation sequencing, one of the key problems in computational epigenomics is the poor understanding of correlations and quantitative differences...

Descripción completa

Detalles Bibliográficos
Autores principales: Madrigal, Pedro, Krajewski, Paweł
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4488123/
https://www.ncbi.nlm.nih.gov/pubmed/26140054
http://dx.doi.org/10.1186/s13040-015-0051-7
_version_ 1782379100673933312
author Madrigal, Pedro
Krajewski, Paweł
author_facet Madrigal, Pedro
Krajewski, Paweł
author_sort Madrigal, Pedro
collection PubMed
description BACKGROUND: Larger variation exists in epigenomes than in genomes, as a single genome shapes the identity of multiple cell types. With the advent of next-generation sequencing, one of the key problems in computational epigenomics is the poor understanding of correlations and quantitative differences between large scale data sets. RESULTS: Here we bring to genomics a scenario of functional principal component analysis, a finite Karhunen-Loève transform, and explicitly decompose the variation in the coverage profiles of 27 chromatin mark ChIP-seq datasets at transcription start sites for H1, one of the most used human embryonic stem cell lines. Using this approach we identify positive correlations between H3K4me3 and H3K36me3, as well as between H3K9ac and H3K36me3, so far undetected by the most commonly used Pearson correlation between read enrichment coverages. We uncover highly negative correlations between H2A.Z, H3K4me3, and several histone acetylation marks, but these occur only between principal components of first and second order. We also demonstrate that levels of gene expression correlate significantly with scores of components of order higher than one, demonstrating that transcriptional regulation by histone marks escapes simple one-to-one relationships. This correlations were higher in significance and magnitude in protein coding genes than in non-coding RNAs. CONCLUSIONS: In summary, we present a methodology to explore and uncover novel patterns of epigenomic variability and covariability in genomic data sets by using a functional eigenvalue decomposition of genomic data. R code is available at: http://github.com/pmb59/KLTepigenome. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13040-015-0051-7) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4488123
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-44881232015-07-03 Uncovering correlated variability in epigenomic datasets using the Karhunen-Loeve transform Madrigal, Pedro Krajewski, Paweł BioData Min Methodology BACKGROUND: Larger variation exists in epigenomes than in genomes, as a single genome shapes the identity of multiple cell types. With the advent of next-generation sequencing, one of the key problems in computational epigenomics is the poor understanding of correlations and quantitative differences between large scale data sets. RESULTS: Here we bring to genomics a scenario of functional principal component analysis, a finite Karhunen-Loève transform, and explicitly decompose the variation in the coverage profiles of 27 chromatin mark ChIP-seq datasets at transcription start sites for H1, one of the most used human embryonic stem cell lines. Using this approach we identify positive correlations between H3K4me3 and H3K36me3, as well as between H3K9ac and H3K36me3, so far undetected by the most commonly used Pearson correlation between read enrichment coverages. We uncover highly negative correlations between H2A.Z, H3K4me3, and several histone acetylation marks, but these occur only between principal components of first and second order. We also demonstrate that levels of gene expression correlate significantly with scores of components of order higher than one, demonstrating that transcriptional regulation by histone marks escapes simple one-to-one relationships. This correlations were higher in significance and magnitude in protein coding genes than in non-coding RNAs. CONCLUSIONS: In summary, we present a methodology to explore and uncover novel patterns of epigenomic variability and covariability in genomic data sets by using a functional eigenvalue decomposition of genomic data. R code is available at: http://github.com/pmb59/KLTepigenome. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13040-015-0051-7) contains supplementary material, which is available to authorized users. BioMed Central 2015-07-01 /pmc/articles/PMC4488123/ /pubmed/26140054 http://dx.doi.org/10.1186/s13040-015-0051-7 Text en © Madrigal and Krajewski. 2015 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology
Madrigal, Pedro
Krajewski, Paweł
Uncovering correlated variability in epigenomic datasets using the Karhunen-Loeve transform
title Uncovering correlated variability in epigenomic datasets using the Karhunen-Loeve transform
title_full Uncovering correlated variability in epigenomic datasets using the Karhunen-Loeve transform
title_fullStr Uncovering correlated variability in epigenomic datasets using the Karhunen-Loeve transform
title_full_unstemmed Uncovering correlated variability in epigenomic datasets using the Karhunen-Loeve transform
title_short Uncovering correlated variability in epigenomic datasets using the Karhunen-Loeve transform
title_sort uncovering correlated variability in epigenomic datasets using the karhunen-loeve transform
topic Methodology
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4488123/
https://www.ncbi.nlm.nih.gov/pubmed/26140054
http://dx.doi.org/10.1186/s13040-015-0051-7
work_keys_str_mv AT madrigalpedro uncoveringcorrelatedvariabilityinepigenomicdatasetsusingthekarhunenloevetransform
AT krajewskipaweł uncoveringcorrelatedvariabilityinepigenomicdatasetsusingthekarhunenloevetransform