Cargando…
Ranking genomic features using an information-theoretic measure of epigenetic discordance
BACKGROUND: Establishment and maintenance of DNA methylation throughout the genome is an important epigenetic mechanism that regulates gene expression whose disruption has been implicated in human diseases like cancer. It is therefore crucial to know which genes, or other genomic features of interes...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6454630/ https://www.ncbi.nlm.nih.gov/pubmed/30961526 http://dx.doi.org/10.1186/s12859-019-2777-6 |
_version_ | 1783409575433601024 |
---|---|
author | Jenkinson, Garrett Abante, Jordi Koldobskiy, Michael A. Feinberg, Andrew P. Goutsias, John |
author_facet | Jenkinson, Garrett Abante, Jordi Koldobskiy, Michael A. Feinberg, Andrew P. Goutsias, John |
author_sort | Jenkinson, Garrett |
collection | PubMed |
description | BACKGROUND: Establishment and maintenance of DNA methylation throughout the genome is an important epigenetic mechanism that regulates gene expression whose disruption has been implicated in human diseases like cancer. It is therefore crucial to know which genes, or other genomic features of interest, exhibit significant discordance in DNA methylation between two phenotypes. We have previously proposed an approach for ranking genes based on methylation discordance within their promoter regions, determined by centering a window of fixed size at their transcription start sites. However, we cannot use this method to identify statistically significant genomic features and handle features of variable length and with missing data. RESULTS: We present a new approach for computing the statistical significance of methylation discordance within genomic features of interest in single and multiple test/reference studies. We base the proposed method on a well-articulated hypothesis testing problem that produces p- and q-values for each genomic feature, which we then use to identify and rank features based on the statistical significance of their epigenetic dysregulation. We employ the information-theoretic concept of mutual information to derive a novel test statistic, which we can evaluate by computing Jensen-Shannon distances between the probability distributions of methylation in a test and a reference sample. We design the proposed methodology to simultaneously handle biological, statistical, and technical variability in the data, as well as variable feature lengths and missing data, thus enabling its wide-spread use on any list of genomic features. This is accomplished by estimating, from reference data, the null distribution of the test statistic as a function of feature length using generalized additive regression models. Differential assessment, using normal/cancer data from healthy fetal tissue and pediatric high-grade glioma patients, illustrates the potential of our approach to greatly facilitate the exploratory phases of clinically and biologically relevant methylation studies. CONCLUSIONS: The proposed approach provides the first computational tool for statistically testing and ranking genomic features of interest based on observed DNA methylation discordance in comparative studies that accounts, in a rigorous manner, for biological, statistical, and technical variability in methylation data, as well as for variability in feature length and for missing data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2777-6) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-6454630 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-64546302019-04-19 Ranking genomic features using an information-theoretic measure of epigenetic discordance Jenkinson, Garrett Abante, Jordi Koldobskiy, Michael A. Feinberg, Andrew P. Goutsias, John BMC Bioinformatics Methodology Article BACKGROUND: Establishment and maintenance of DNA methylation throughout the genome is an important epigenetic mechanism that regulates gene expression whose disruption has been implicated in human diseases like cancer. It is therefore crucial to know which genes, or other genomic features of interest, exhibit significant discordance in DNA methylation between two phenotypes. We have previously proposed an approach for ranking genes based on methylation discordance within their promoter regions, determined by centering a window of fixed size at their transcription start sites. However, we cannot use this method to identify statistically significant genomic features and handle features of variable length and with missing data. RESULTS: We present a new approach for computing the statistical significance of methylation discordance within genomic features of interest in single and multiple test/reference studies. We base the proposed method on a well-articulated hypothesis testing problem that produces p- and q-values for each genomic feature, which we then use to identify and rank features based on the statistical significance of their epigenetic dysregulation. We employ the information-theoretic concept of mutual information to derive a novel test statistic, which we can evaluate by computing Jensen-Shannon distances between the probability distributions of methylation in a test and a reference sample. We design the proposed methodology to simultaneously handle biological, statistical, and technical variability in the data, as well as variable feature lengths and missing data, thus enabling its wide-spread use on any list of genomic features. This is accomplished by estimating, from reference data, the null distribution of the test statistic as a function of feature length using generalized additive regression models. Differential assessment, using normal/cancer data from healthy fetal tissue and pediatric high-grade glioma patients, illustrates the potential of our approach to greatly facilitate the exploratory phases of clinically and biologically relevant methylation studies. CONCLUSIONS: The proposed approach provides the first computational tool for statistically testing and ranking genomic features of interest based on observed DNA methylation discordance in comparative studies that accounts, in a rigorous manner, for biological, statistical, and technical variability in methylation data, as well as for variability in feature length and for missing data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2777-6) contains supplementary material, which is available to authorized users. BioMed Central 2019-04-08 /pmc/articles/PMC6454630/ /pubmed/30961526 http://dx.doi.org/10.1186/s12859-019-2777-6 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Article Jenkinson, Garrett Abante, Jordi Koldobskiy, Michael A. Feinberg, Andrew P. Goutsias, John Ranking genomic features using an information-theoretic measure of epigenetic discordance |
title | Ranking genomic features using an information-theoretic measure of epigenetic discordance |
title_full | Ranking genomic features using an information-theoretic measure of epigenetic discordance |
title_fullStr | Ranking genomic features using an information-theoretic measure of epigenetic discordance |
title_full_unstemmed | Ranking genomic features using an information-theoretic measure of epigenetic discordance |
title_short | Ranking genomic features using an information-theoretic measure of epigenetic discordance |
title_sort | ranking genomic features using an information-theoretic measure of epigenetic discordance |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6454630/ https://www.ncbi.nlm.nih.gov/pubmed/30961526 http://dx.doi.org/10.1186/s12859-019-2777-6 |
work_keys_str_mv | AT jenkinsongarrett rankinggenomicfeaturesusinganinformationtheoreticmeasureofepigeneticdiscordance AT abantejordi rankinggenomicfeaturesusinganinformationtheoreticmeasureofepigeneticdiscordance AT koldobskiymichaela rankinggenomicfeaturesusinganinformationtheoreticmeasureofepigeneticdiscordance AT feinbergandrewp rankinggenomicfeaturesusinganinformationtheoreticmeasureofepigeneticdiscordance AT goutsiasjohn rankinggenomicfeaturesusinganinformationtheoreticmeasureofepigeneticdiscordance |