Cargando…

Ranking genomic features using an information-theoretic measure of epigenetic discordance

BACKGROUND: Establishment and maintenance of DNA methylation throughout the genome is an important epigenetic mechanism that regulates gene expression whose disruption has been implicated in human diseases like cancer. It is therefore crucial to know which genes, or other genomic features of interes...

Descripción completa

Detalles Bibliográficos
Autores principales: Jenkinson, Garrett, Abante, Jordi, Koldobskiy, Michael A., Feinberg, Andrew P., Goutsias, John
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6454630/
https://www.ncbi.nlm.nih.gov/pubmed/30961526
http://dx.doi.org/10.1186/s12859-019-2777-6
_version_ 1783409575433601024
author Jenkinson, Garrett
Abante, Jordi
Koldobskiy, Michael A.
Feinberg, Andrew P.
Goutsias, John
author_facet Jenkinson, Garrett
Abante, Jordi
Koldobskiy, Michael A.
Feinberg, Andrew P.
Goutsias, John
author_sort Jenkinson, Garrett
collection PubMed
description BACKGROUND: Establishment and maintenance of DNA methylation throughout the genome is an important epigenetic mechanism that regulates gene expression whose disruption has been implicated in human diseases like cancer. It is therefore crucial to know which genes, or other genomic features of interest, exhibit significant discordance in DNA methylation between two phenotypes. We have previously proposed an approach for ranking genes based on methylation discordance within their promoter regions, determined by centering a window of fixed size at their transcription start sites. However, we cannot use this method to identify statistically significant genomic features and handle features of variable length and with missing data. RESULTS: We present a new approach for computing the statistical significance of methylation discordance within genomic features of interest in single and multiple test/reference studies. We base the proposed method on a well-articulated hypothesis testing problem that produces p- and q-values for each genomic feature, which we then use to identify and rank features based on the statistical significance of their epigenetic dysregulation. We employ the information-theoretic concept of mutual information to derive a novel test statistic, which we can evaluate by computing Jensen-Shannon distances between the probability distributions of methylation in a test and a reference sample. We design the proposed methodology to simultaneously handle biological, statistical, and technical variability in the data, as well as variable feature lengths and missing data, thus enabling its wide-spread use on any list of genomic features. This is accomplished by estimating, from reference data, the null distribution of the test statistic as a function of feature length using generalized additive regression models. Differential assessment, using normal/cancer data from healthy fetal tissue and pediatric high-grade glioma patients, illustrates the potential of our approach to greatly facilitate the exploratory phases of clinically and biologically relevant methylation studies. CONCLUSIONS: The proposed approach provides the first computational tool for statistically testing and ranking genomic features of interest based on observed DNA methylation discordance in comparative studies that accounts, in a rigorous manner, for biological, statistical, and technical variability in methylation data, as well as for variability in feature length and for missing data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2777-6) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-6454630
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-64546302019-04-19 Ranking genomic features using an information-theoretic measure of epigenetic discordance Jenkinson, Garrett Abante, Jordi Koldobskiy, Michael A. Feinberg, Andrew P. Goutsias, John BMC Bioinformatics Methodology Article BACKGROUND: Establishment and maintenance of DNA methylation throughout the genome is an important epigenetic mechanism that regulates gene expression whose disruption has been implicated in human diseases like cancer. It is therefore crucial to know which genes, or other genomic features of interest, exhibit significant discordance in DNA methylation between two phenotypes. We have previously proposed an approach for ranking genes based on methylation discordance within their promoter regions, determined by centering a window of fixed size at their transcription start sites. However, we cannot use this method to identify statistically significant genomic features and handle features of variable length and with missing data. RESULTS: We present a new approach for computing the statistical significance of methylation discordance within genomic features of interest in single and multiple test/reference studies. We base the proposed method on a well-articulated hypothesis testing problem that produces p- and q-values for each genomic feature, which we then use to identify and rank features based on the statistical significance of their epigenetic dysregulation. We employ the information-theoretic concept of mutual information to derive a novel test statistic, which we can evaluate by computing Jensen-Shannon distances between the probability distributions of methylation in a test and a reference sample. We design the proposed methodology to simultaneously handle biological, statistical, and technical variability in the data, as well as variable feature lengths and missing data, thus enabling its wide-spread use on any list of genomic features. This is accomplished by estimating, from reference data, the null distribution of the test statistic as a function of feature length using generalized additive regression models. Differential assessment, using normal/cancer data from healthy fetal tissue and pediatric high-grade glioma patients, illustrates the potential of our approach to greatly facilitate the exploratory phases of clinically and biologically relevant methylation studies. CONCLUSIONS: The proposed approach provides the first computational tool for statistically testing and ranking genomic features of interest based on observed DNA methylation discordance in comparative studies that accounts, in a rigorous manner, for biological, statistical, and technical variability in methylation data, as well as for variability in feature length and for missing data. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-019-2777-6) contains supplementary material, which is available to authorized users. BioMed Central 2019-04-08 /pmc/articles/PMC6454630/ /pubmed/30961526 http://dx.doi.org/10.1186/s12859-019-2777-6 Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Methodology Article
Jenkinson, Garrett
Abante, Jordi
Koldobskiy, Michael A.
Feinberg, Andrew P.
Goutsias, John
Ranking genomic features using an information-theoretic measure of epigenetic discordance
title Ranking genomic features using an information-theoretic measure of epigenetic discordance
title_full Ranking genomic features using an information-theoretic measure of epigenetic discordance
title_fullStr Ranking genomic features using an information-theoretic measure of epigenetic discordance
title_full_unstemmed Ranking genomic features using an information-theoretic measure of epigenetic discordance
title_short Ranking genomic features using an information-theoretic measure of epigenetic discordance
title_sort ranking genomic features using an information-theoretic measure of epigenetic discordance
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6454630/
https://www.ncbi.nlm.nih.gov/pubmed/30961526
http://dx.doi.org/10.1186/s12859-019-2777-6
work_keys_str_mv AT jenkinsongarrett rankinggenomicfeaturesusinganinformationtheoreticmeasureofepigeneticdiscordance
AT abantejordi rankinggenomicfeaturesusinganinformationtheoreticmeasureofepigeneticdiscordance
AT koldobskiymichaela rankinggenomicfeaturesusinganinformationtheoreticmeasureofepigeneticdiscordance
AT feinbergandrewp rankinggenomicfeaturesusinganinformationtheoreticmeasureofepigeneticdiscordance
AT goutsiasjohn rankinggenomicfeaturesusinganinformationtheoreticmeasureofepigeneticdiscordance