Cargando…

A Flexible, Efficient Binomial Mixed Model for Identifying Differential DNA Methylation in Bisulfite Sequencing Data

Identifying sources of variation in DNA methylation levels is important for understanding gene regulation. Recently, bisulfite sequencing has become a popular tool for investigating DNA methylation levels. However, modeling bisulfite sequencing data is complicated by dramatic variation in coverage a...

Descripción completa

Detalles Bibliográficos
Autores principales: Lea, Amanda J., Tung, Jenny, Zhou, Xiang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Public Library of Science 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4657956/
https://www.ncbi.nlm.nih.gov/pubmed/26599596
http://dx.doi.org/10.1371/journal.pgen.1005650
_version_ 1782402443472011264
author Lea, Amanda J.
Tung, Jenny
Zhou, Xiang
author_facet Lea, Amanda J.
Tung, Jenny
Zhou, Xiang
author_sort Lea, Amanda J.
collection PubMed
description Identifying sources of variation in DNA methylation levels is important for understanding gene regulation. Recently, bisulfite sequencing has become a popular tool for investigating DNA methylation levels. However, modeling bisulfite sequencing data is complicated by dramatic variation in coverage across sites and individual samples, and because of the computational challenges of controlling for genetic covariance in count data. To address these challenges, we present a binomial mixed model and an efficient, sampling-based algorithm (MACAU: Mixed model association for count data via data augmentation) for approximate parameter estimation and p-value computation. This framework allows us to simultaneously account for both the over-dispersed, count-based nature of bisulfite sequencing data, as well as genetic relatedness among individuals. Using simulations and two real data sets (whole genome bisulfite sequencing (WGBS) data from Arabidopsis thaliana and reduced representation bisulfite sequencing (RRBS) data from baboons), we show that our method provides well-calibrated test statistics in the presence of population structure. Further, it improves power to detect differentially methylated sites: in the RRBS data set, MACAU detected 1.6-fold more age-associated CpG sites than a beta-binomial model (the next best approach). Changes in these sites are consistent with known age-related shifts in DNA methylation levels, and are enriched near genes that are differentially expressed with age in the same population. Taken together, our results indicate that MACAU is an efficient, effective tool for analyzing bisulfite sequencing data, with particular salience to analyses of structured populations. MACAU is freely available at www.xzlab.org/software.html.
format Online
Article
Text
id pubmed-4657956
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-46579562015-12-02 A Flexible, Efficient Binomial Mixed Model for Identifying Differential DNA Methylation in Bisulfite Sequencing Data Lea, Amanda J. Tung, Jenny Zhou, Xiang PLoS Genet Research Article Identifying sources of variation in DNA methylation levels is important for understanding gene regulation. Recently, bisulfite sequencing has become a popular tool for investigating DNA methylation levels. However, modeling bisulfite sequencing data is complicated by dramatic variation in coverage across sites and individual samples, and because of the computational challenges of controlling for genetic covariance in count data. To address these challenges, we present a binomial mixed model and an efficient, sampling-based algorithm (MACAU: Mixed model association for count data via data augmentation) for approximate parameter estimation and p-value computation. This framework allows us to simultaneously account for both the over-dispersed, count-based nature of bisulfite sequencing data, as well as genetic relatedness among individuals. Using simulations and two real data sets (whole genome bisulfite sequencing (WGBS) data from Arabidopsis thaliana and reduced representation bisulfite sequencing (RRBS) data from baboons), we show that our method provides well-calibrated test statistics in the presence of population structure. Further, it improves power to detect differentially methylated sites: in the RRBS data set, MACAU detected 1.6-fold more age-associated CpG sites than a beta-binomial model (the next best approach). Changes in these sites are consistent with known age-related shifts in DNA methylation levels, and are enriched near genes that are differentially expressed with age in the same population. Taken together, our results indicate that MACAU is an efficient, effective tool for analyzing bisulfite sequencing data, with particular salience to analyses of structured populations. MACAU is freely available at www.xzlab.org/software.html. Public Library of Science 2015-11-24 /pmc/articles/PMC4657956/ /pubmed/26599596 http://dx.doi.org/10.1371/journal.pgen.1005650 Text en © 2015 Lea et al http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Lea, Amanda J.
Tung, Jenny
Zhou, Xiang
A Flexible, Efficient Binomial Mixed Model for Identifying Differential DNA Methylation in Bisulfite Sequencing Data
title A Flexible, Efficient Binomial Mixed Model for Identifying Differential DNA Methylation in Bisulfite Sequencing Data
title_full A Flexible, Efficient Binomial Mixed Model for Identifying Differential DNA Methylation in Bisulfite Sequencing Data
title_fullStr A Flexible, Efficient Binomial Mixed Model for Identifying Differential DNA Methylation in Bisulfite Sequencing Data
title_full_unstemmed A Flexible, Efficient Binomial Mixed Model for Identifying Differential DNA Methylation in Bisulfite Sequencing Data
title_short A Flexible, Efficient Binomial Mixed Model for Identifying Differential DNA Methylation in Bisulfite Sequencing Data
title_sort flexible, efficient binomial mixed model for identifying differential dna methylation in bisulfite sequencing data
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4657956/
https://www.ncbi.nlm.nih.gov/pubmed/26599596
http://dx.doi.org/10.1371/journal.pgen.1005650
work_keys_str_mv AT leaamandaj aflexibleefficientbinomialmixedmodelforidentifyingdifferentialdnamethylationinbisulfitesequencingdata
AT tungjenny aflexibleefficientbinomialmixedmodelforidentifyingdifferentialdnamethylationinbisulfitesequencingdata
AT zhouxiang aflexibleefficientbinomialmixedmodelforidentifyingdifferentialdnamethylationinbisulfitesequencingdata
AT leaamandaj flexibleefficientbinomialmixedmodelforidentifyingdifferentialdnamethylationinbisulfitesequencingdata
AT tungjenny flexibleefficientbinomialmixedmodelforidentifyingdifferentialdnamethylationinbisulfitesequencingdata
AT zhouxiang flexibleefficientbinomialmixedmodelforidentifyingdifferentialdnamethylationinbisulfitesequencingdata