Cargando…

CpG Island Mapping by Epigenome Prediction

CpG islands were originally identified by epigenetic and functional properties, namely, absence of DNA methylation and frequent promoter association. However, this concept was quickly replaced by simple DNA sequence criteria, which allowed for genome-wide annotation of CpG islands in the absence of...

Descripción completa

Detalles Bibliográficos
Autores principales: Bock, Christoph, Walter, Jörn, Paulsen, Martina, Lengauer, Thomas
Formato: Texto
Lenguaje:English
Publicado: Public Library of Science 2007
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1892605/
https://www.ncbi.nlm.nih.gov/pubmed/17559301
http://dx.doi.org/10.1371/journal.pcbi.0030110
_version_ 1782133850535624704
author Bock, Christoph
Walter, Jörn
Paulsen, Martina
Lengauer, Thomas
author_facet Bock, Christoph
Walter, Jörn
Paulsen, Martina
Lengauer, Thomas
author_sort Bock, Christoph
collection PubMed
description CpG islands were originally identified by epigenetic and functional properties, namely, absence of DNA methylation and frequent promoter association. However, this concept was quickly replaced by simple DNA sequence criteria, which allowed for genome-wide annotation of CpG islands in the absence of large-scale epigenetic datasets. Although widely used, the current CpG island criteria incur significant disadvantages: (1) reliance on arbitrary threshold parameters that bear little biological justification, (2) failure to account for widespread heterogeneity among CpG islands, and (3) apparent lack of specificity when applied to the human genome. This study is driven by the idea that a quantitative score of “CpG island strength” that incorporates epigenetic and functional aspects can help resolve these issues. We construct an epigenome prediction pipeline that links the DNA sequence of CpG islands to their epigenetic states, including DNA methylation, histone modifications, and chromatin accessibility. By training support vector machines on epigenetic data for CpG islands on human Chromosomes 21 and 22, we identify informative DNA attributes that correlate with open versus compact chromatin structures. These DNA attributes are used to predict the epigenetic states of all CpG islands genome-wide. Combining predictions for multiple epigenetic features, we estimate the inherent CpG island strength for each CpG island in the human genome, i.e., its inherent tendency to exhibit an open and transcriptionally competent chromatin structure. We extensively validate our results on independent datasets, showing that the CpG island strength predictions are applicable and informative across different tissues and cell types, and we derive improved maps of predicted “bona fide” CpG islands. The mapping of CpG islands by epigenome prediction is conceptually superior to identifying CpG islands by widely used sequence criteria since it links CpG island detection to their characteristic epigenetic and functional states. And it is superior to purely experimental epigenome mapping for CpG island detection since it abstracts from specific properties that are limited to a single cell type or tissue. In addition, using computational epigenetics methods we could identify high correlation between the epigenome and characteristics of the DNA sequence, a finding which emphasizes the need for a better understanding of the mechanistic links between genome and epigenome.
format Text
id pubmed-1892605
institution National Center for Biotechnology Information
language English
publishDate 2007
publisher Public Library of Science
record_format MEDLINE/PubMed
spelling pubmed-18926052007-06-30 CpG Island Mapping by Epigenome Prediction Bock, Christoph Walter, Jörn Paulsen, Martina Lengauer, Thomas PLoS Comput Biol Research Article CpG islands were originally identified by epigenetic and functional properties, namely, absence of DNA methylation and frequent promoter association. However, this concept was quickly replaced by simple DNA sequence criteria, which allowed for genome-wide annotation of CpG islands in the absence of large-scale epigenetic datasets. Although widely used, the current CpG island criteria incur significant disadvantages: (1) reliance on arbitrary threshold parameters that bear little biological justification, (2) failure to account for widespread heterogeneity among CpG islands, and (3) apparent lack of specificity when applied to the human genome. This study is driven by the idea that a quantitative score of “CpG island strength” that incorporates epigenetic and functional aspects can help resolve these issues. We construct an epigenome prediction pipeline that links the DNA sequence of CpG islands to their epigenetic states, including DNA methylation, histone modifications, and chromatin accessibility. By training support vector machines on epigenetic data for CpG islands on human Chromosomes 21 and 22, we identify informative DNA attributes that correlate with open versus compact chromatin structures. These DNA attributes are used to predict the epigenetic states of all CpG islands genome-wide. Combining predictions for multiple epigenetic features, we estimate the inherent CpG island strength for each CpG island in the human genome, i.e., its inherent tendency to exhibit an open and transcriptionally competent chromatin structure. We extensively validate our results on independent datasets, showing that the CpG island strength predictions are applicable and informative across different tissues and cell types, and we derive improved maps of predicted “bona fide” CpG islands. The mapping of CpG islands by epigenome prediction is conceptually superior to identifying CpG islands by widely used sequence criteria since it links CpG island detection to their characteristic epigenetic and functional states. And it is superior to purely experimental epigenome mapping for CpG island detection since it abstracts from specific properties that are limited to a single cell type or tissue. In addition, using computational epigenetics methods we could identify high correlation between the epigenome and characteristics of the DNA sequence, a finding which emphasizes the need for a better understanding of the mechanistic links between genome and epigenome. Public Library of Science 2007-06 2007-06-08 /pmc/articles/PMC1892605/ /pubmed/17559301 http://dx.doi.org/10.1371/journal.pcbi.0030110 Text en © 2007 Bock et al. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are properly credited.
spellingShingle Research Article
Bock, Christoph
Walter, Jörn
Paulsen, Martina
Lengauer, Thomas
CpG Island Mapping by Epigenome Prediction
title CpG Island Mapping by Epigenome Prediction
title_full CpG Island Mapping by Epigenome Prediction
title_fullStr CpG Island Mapping by Epigenome Prediction
title_full_unstemmed CpG Island Mapping by Epigenome Prediction
title_short CpG Island Mapping by Epigenome Prediction
title_sort cpg island mapping by epigenome prediction
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC1892605/
https://www.ncbi.nlm.nih.gov/pubmed/17559301
http://dx.doi.org/10.1371/journal.pcbi.0030110
work_keys_str_mv AT bockchristoph cpgislandmappingbyepigenomeprediction
AT walterjorn cpgislandmappingbyepigenomeprediction
AT paulsenmartina cpgislandmappingbyepigenomeprediction
AT lengauerthomas cpgislandmappingbyepigenomeprediction