Cargando…

Predicting genome-wide DNA methylation using methylation marks, genomic position, and DNA regulatory elements

BACKGROUND: Recent assays for individual-specific genome-wide DNA methylation profiles have enabled epigenome-wide association studies to identify specific CpG sites associated with a phenotype. Computational prediction of CpG site-specific methylation levels is critical to enable genome-wide analys...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Weiwei, Spector, Tim D, Deloukas, Panos, Bell, Jordana T, Engelhardt, Barbara E
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2015
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4389802/
https://www.ncbi.nlm.nih.gov/pubmed/25616342
http://dx.doi.org/10.1186/s13059-015-0581-9
_version_ 1782365613522419712
author Zhang, Weiwei
Spector, Tim D
Deloukas, Panos
Bell, Jordana T
Engelhardt, Barbara E
author_facet Zhang, Weiwei
Spector, Tim D
Deloukas, Panos
Bell, Jordana T
Engelhardt, Barbara E
author_sort Zhang, Weiwei
collection PubMed
description BACKGROUND: Recent assays for individual-specific genome-wide DNA methylation profiles have enabled epigenome-wide association studies to identify specific CpG sites associated with a phenotype. Computational prediction of CpG site-specific methylation levels is critical to enable genome-wide analyses, but current approaches tackle average methylation within a locus and are often limited to specific genomic regions. RESULTS: We characterize genome-wide DNA methylation patterns, and show that correlation among CpG sites decays rapidly, making predictions solely based on neighboring sites challenging. We built a random forest classifier to predict methylation levels at CpG site resolution using features including neighboring CpG site methylation levels and genomic distance, co-localization with coding regions, CpG islands (CGIs), and regulatory elements from the ENCODE project. Our approach achieves 92% prediction accuracy of genome-wide methylation levels at single-CpG-site precision. The accuracy increases to 98% when restricted to CpG sites within CGIs and is robust across platform and cell-type heterogeneity. Our classifier outperforms other types of classifiers and identifies features that contribute to prediction accuracy: neighboring CpG site methylation, CGIs, co-localized DNase I hypersensitive sites, transcription factor binding sites, and histone modifications were found to be most predictive of methylation levels. CONCLUSIONS: Our observations of DNA methylation patterns led us to develop a classifier to predict DNA methylation levels at CpG site resolution with high accuracy. Furthermore, our method identified genomic features that interact with DNA methylation, suggesting mechanisms involved in DNA methylation modification and regulation, and linking diverse epigenetic processes. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13059-015-0581-9) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4389802
institution National Center for Biotechnology Information
language English
publishDate 2015
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-43898022015-04-09 Predicting genome-wide DNA methylation using methylation marks, genomic position, and DNA regulatory elements Zhang, Weiwei Spector, Tim D Deloukas, Panos Bell, Jordana T Engelhardt, Barbara E Genome Biol Research BACKGROUND: Recent assays for individual-specific genome-wide DNA methylation profiles have enabled epigenome-wide association studies to identify specific CpG sites associated with a phenotype. Computational prediction of CpG site-specific methylation levels is critical to enable genome-wide analyses, but current approaches tackle average methylation within a locus and are often limited to specific genomic regions. RESULTS: We characterize genome-wide DNA methylation patterns, and show that correlation among CpG sites decays rapidly, making predictions solely based on neighboring sites challenging. We built a random forest classifier to predict methylation levels at CpG site resolution using features including neighboring CpG site methylation levels and genomic distance, co-localization with coding regions, CpG islands (CGIs), and regulatory elements from the ENCODE project. Our approach achieves 92% prediction accuracy of genome-wide methylation levels at single-CpG-site precision. The accuracy increases to 98% when restricted to CpG sites within CGIs and is robust across platform and cell-type heterogeneity. Our classifier outperforms other types of classifiers and identifies features that contribute to prediction accuracy: neighboring CpG site methylation, CGIs, co-localized DNase I hypersensitive sites, transcription factor binding sites, and histone modifications were found to be most predictive of methylation levels. CONCLUSIONS: Our observations of DNA methylation patterns led us to develop a classifier to predict DNA methylation levels at CpG site resolution with high accuracy. Furthermore, our method identified genomic features that interact with DNA methylation, suggesting mechanisms involved in DNA methylation modification and regulation, and linking diverse epigenetic processes. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s13059-015-0581-9) contains supplementary material, which is available to authorized users. BioMed Central 2015-01-24 2015 /pmc/articles/PMC4389802/ /pubmed/25616342 http://dx.doi.org/10.1186/s13059-015-0581-9 Text en © Zhang et al.; licensee BioMed Central. 2015 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited.
spellingShingle Research
Zhang, Weiwei
Spector, Tim D
Deloukas, Panos
Bell, Jordana T
Engelhardt, Barbara E
Predicting genome-wide DNA methylation using methylation marks, genomic position, and DNA regulatory elements
title Predicting genome-wide DNA methylation using methylation marks, genomic position, and DNA regulatory elements
title_full Predicting genome-wide DNA methylation using methylation marks, genomic position, and DNA regulatory elements
title_fullStr Predicting genome-wide DNA methylation using methylation marks, genomic position, and DNA regulatory elements
title_full_unstemmed Predicting genome-wide DNA methylation using methylation marks, genomic position, and DNA regulatory elements
title_short Predicting genome-wide DNA methylation using methylation marks, genomic position, and DNA regulatory elements
title_sort predicting genome-wide dna methylation using methylation marks, genomic position, and dna regulatory elements
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4389802/
https://www.ncbi.nlm.nih.gov/pubmed/25616342
http://dx.doi.org/10.1186/s13059-015-0581-9
work_keys_str_mv AT zhangweiwei predictinggenomewidednamethylationusingmethylationmarksgenomicpositionanddnaregulatoryelements
AT spectortimd predictinggenomewidednamethylationusingmethylationmarksgenomicpositionanddnaregulatoryelements
AT deloukaspanos predictinggenomewidednamethylationusingmethylationmarksgenomicpositionanddnaregulatoryelements
AT belljordanat predictinggenomewidednamethylationusingmethylationmarksgenomicpositionanddnaregulatoryelements
AT engelhardtbarbarae predictinggenomewidednamethylationusingmethylationmarksgenomicpositionanddnaregulatoryelements