Cargando…

Inferring CpG methylation signatures accumulated along human history from genetic variation catalogs

Understanding the DNA methylation patterns in the human genome is a key step to decipher gene regulatory mechanisms and model mutation rate heterogeneity in the human genome. While methylation rates can be measured e.g. with bisulfite sequencing, such measures do not capture historical patterns. Her...

Descripción completa

Detalles Bibliográficos
Autores principales: Si, Yichen, Zöllner, Sebastian
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10055312/
https://www.ncbi.nlm.nih.gov/pubmed/36993375
http://dx.doi.org/10.1101/2023.03.24.534151
Descripción
Sumario:Understanding the DNA methylation patterns in the human genome is a key step to decipher gene regulatory mechanisms and model mutation rate heterogeneity in the human genome. While methylation rates can be measured e.g. with bisulfite sequencing, such measures do not capture historical patterns. Here we present a new method, Methylation Hidden Markov Model (MHMM), to estimate the accumulated germline methylation signature in human population history leveraging two properties: (1) Mutation rates of cytosine to thymine transitions at methylated CG dinucleotides are orders of magnitude higher than that in the rest of the genome. (2) Methylation levels are locally correlated, so the allele frequencies of neighboring CpGs can be used jointly to estimate methylation status. We applied MHMM to allele frequencies from the TOPMed and the gnomAD genetic variation catalogs. Our estimates are consistent with whole genome bisulfite sequencing (WGBS) measured human germ cell methylation levels at 90% of CpG sites, but we also identified ~ 442, 000 historically methylated CpG sites that could not be captured due to sample genetic variation, and inferred methylation status for ~ 721, 000 CpG sites that were missing from WGBS. Hypo-methylated regions identified by combining our results with experimental measures are 1.7 times more likely to recover known active genomic regions than those identified by WGBS alone. Our estimated historical methylation status can be leveraged to enhance bioinformatic analysis of germline methylation such as annotating regulatory and inactivated genomic regions and provide insights in sequence evolution including predicting mutation constraint.