Cargando…
An information-theoretic approach to the modeling and analysis of whole-genome bisulfite sequencing data
BACKGROUND: DNA methylation is a stable form of epigenetic memory used by cells to control gene expression. Whole genome bisulfite sequencing (WGBS) has emerged as a gold-standard experimental technique for studying DNA methylation by producing high resolution genome-wide methylation profiles. Stati...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2018
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5842653/ https://www.ncbi.nlm.nih.gov/pubmed/29514626 http://dx.doi.org/10.1186/s12859-018-2086-5 |
_version_ | 1783304944416194560 |
---|---|
author | Jenkinson, Garrett Abante, Jordi Feinberg, Andrew P. Goutsias, John |
author_facet | Jenkinson, Garrett Abante, Jordi Feinberg, Andrew P. Goutsias, John |
author_sort | Jenkinson, Garrett |
collection | PubMed |
description | BACKGROUND: DNA methylation is a stable form of epigenetic memory used by cells to control gene expression. Whole genome bisulfite sequencing (WGBS) has emerged as a gold-standard experimental technique for studying DNA methylation by producing high resolution genome-wide methylation profiles. Statistical modeling and analysis is employed to computationally extract and quantify information from these profiles in an effort to identify regions of the genome that demonstrate crucial or aberrant epigenetic behavior. However, the performance of most currently available methods for methylation analysis is hampered by their inability to directly account for statistical dependencies between neighboring methylation sites, thus ignoring significant information available in WGBS reads. RESULTS: We present a powerful information-theoretic approach for genome-wide modeling and analysis of WGBS data based on the 1D Ising model of statistical physics. This approach takes into account correlations in methylation by utilizing a joint probability model that encapsulates all information available in WGBS methylation reads and produces accurate results even when applied on single WGBS samples with low coverage. Using the Shannon entropy, our approach provides a rigorous quantification of methylation stochasticity in individual WGBS samples genome-wide. Furthermore, it utilizes the Jensen-Shannon distance to evaluate differences in methylation distributions between a test and a reference sample. Differential performance assessment using simulated and real human lung normal/cancer data demonstrate a clear superiority of our approach over DSS, a recently proposed method for WGBS data analysis. Critically, these results demonstrate that marginal methods become statistically invalid when correlations are present in the data. CONCLUSIONS: This contribution demonstrates clear benefits and the necessity of modeling joint probability distributions of methylation using the 1D Ising model of statistical physics and of quantifying methylation stochasticity using concepts from information theory. By employing this methodology, substantial improvement of DNA methylation analysis can be achieved by effectively taking into account the massive amount of statistical information available in WGBS data, which is largely ignored by existing methods. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2086-5) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-5842653 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2018 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-58426532018-03-14 An information-theoretic approach to the modeling and analysis of whole-genome bisulfite sequencing data Jenkinson, Garrett Abante, Jordi Feinberg, Andrew P. Goutsias, John BMC Bioinformatics Methodology Article BACKGROUND: DNA methylation is a stable form of epigenetic memory used by cells to control gene expression. Whole genome bisulfite sequencing (WGBS) has emerged as a gold-standard experimental technique for studying DNA methylation by producing high resolution genome-wide methylation profiles. Statistical modeling and analysis is employed to computationally extract and quantify information from these profiles in an effort to identify regions of the genome that demonstrate crucial or aberrant epigenetic behavior. However, the performance of most currently available methods for methylation analysis is hampered by their inability to directly account for statistical dependencies between neighboring methylation sites, thus ignoring significant information available in WGBS reads. RESULTS: We present a powerful information-theoretic approach for genome-wide modeling and analysis of WGBS data based on the 1D Ising model of statistical physics. This approach takes into account correlations in methylation by utilizing a joint probability model that encapsulates all information available in WGBS methylation reads and produces accurate results even when applied on single WGBS samples with low coverage. Using the Shannon entropy, our approach provides a rigorous quantification of methylation stochasticity in individual WGBS samples genome-wide. Furthermore, it utilizes the Jensen-Shannon distance to evaluate differences in methylation distributions between a test and a reference sample. Differential performance assessment using simulated and real human lung normal/cancer data demonstrate a clear superiority of our approach over DSS, a recently proposed method for WGBS data analysis. Critically, these results demonstrate that marginal methods become statistically invalid when correlations are present in the data. CONCLUSIONS: This contribution demonstrates clear benefits and the necessity of modeling joint probability distributions of methylation using the 1D Ising model of statistical physics and of quantifying methylation stochasticity using concepts from information theory. By employing this methodology, substantial improvement of DNA methylation analysis can be achieved by effectively taking into account the massive amount of statistical information available in WGBS data, which is largely ignored by existing methods. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2086-5) contains supplementary material, which is available to authorized users. BioMed Central 2018-03-07 /pmc/articles/PMC5842653/ /pubmed/29514626 http://dx.doi.org/10.1186/s12859-018-2086-5 Text en © The Author(s) 2018 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Methodology Article Jenkinson, Garrett Abante, Jordi Feinberg, Andrew P. Goutsias, John An information-theoretic approach to the modeling and analysis of whole-genome bisulfite sequencing data |
title | An information-theoretic approach to the modeling and analysis of whole-genome bisulfite sequencing data |
title_full | An information-theoretic approach to the modeling and analysis of whole-genome bisulfite sequencing data |
title_fullStr | An information-theoretic approach to the modeling and analysis of whole-genome bisulfite sequencing data |
title_full_unstemmed | An information-theoretic approach to the modeling and analysis of whole-genome bisulfite sequencing data |
title_short | An information-theoretic approach to the modeling and analysis of whole-genome bisulfite sequencing data |
title_sort | information-theoretic approach to the modeling and analysis of whole-genome bisulfite sequencing data |
topic | Methodology Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5842653/ https://www.ncbi.nlm.nih.gov/pubmed/29514626 http://dx.doi.org/10.1186/s12859-018-2086-5 |
work_keys_str_mv | AT jenkinsongarrett aninformationtheoreticapproachtothemodelingandanalysisofwholegenomebisulfitesequencingdata AT abantejordi aninformationtheoreticapproachtothemodelingandanalysisofwholegenomebisulfitesequencingdata AT feinbergandrewp aninformationtheoreticapproachtothemodelingandanalysisofwholegenomebisulfitesequencingdata AT goutsiasjohn aninformationtheoreticapproachtothemodelingandanalysisofwholegenomebisulfitesequencingdata AT jenkinsongarrett informationtheoreticapproachtothemodelingandanalysisofwholegenomebisulfitesequencingdata AT abantejordi informationtheoreticapproachtothemodelingandanalysisofwholegenomebisulfitesequencingdata AT feinbergandrewp informationtheoreticapproachtothemodelingandanalysisofwholegenomebisulfitesequencingdata AT goutsiasjohn informationtheoreticapproachtothemodelingandanalysisofwholegenomebisulfitesequencingdata |