Cargando…

CpGIMethPred: computational model for predicting methylation status of CpG islands in human genome

DNA methylation is an inheritable chemical modification of cytosine, and represents one of the most important epigenetic events. Computational prediction of the DNA methylation status can be employed to speed up the genome-wide methylation profiling, and to identify the key features that are correla...

Descripción completa

Detalles Bibliográficos
Autores principales: Zheng, Hao, Wu, Hongwei, Li, Jinping, Jiang, Shi-Wen
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2013
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3552668/
https://www.ncbi.nlm.nih.gov/pubmed/23369266
http://dx.doi.org/10.1186/1755-8794-6-S1-S13
_version_ 1782256695995531264
author Zheng, Hao
Wu, Hongwei
Li, Jinping
Jiang, Shi-Wen
author_facet Zheng, Hao
Wu, Hongwei
Li, Jinping
Jiang, Shi-Wen
author_sort Zheng, Hao
collection PubMed
description DNA methylation is an inheritable chemical modification of cytosine, and represents one of the most important epigenetic events. Computational prediction of the DNA methylation status can be employed to speed up the genome-wide methylation profiling, and to identify the key features that are correlated with various methylation patterns. Here, we develop CpGIMethPred, the support vector machine-based models to predict the methylation status of the CpG islands in the human genome under normal conditions. The features for prediction include those that have been previously demonstrated effective (CpG island specific attributes, DNA sequence composition patterns, DNA structure patterns, distribution patterns of conserved transcription factor binding sites and conserved elements, and histone methylation status) as well as those that have not been extensively explored but are likely to contribute additional information from a biological point of view (nucleosome positioning propensities, gene functions, and histone acetylation status). Statistical tests are performed to identify the features that are significantly correlated with the methylation status of the CpG islands, and principal component analysis is then performed to decorrelate the selected features. Data from the Human Epigenome Project (HEP) are used to train, validate and test the predictive models. Specifically, the models are trained and validated by using the DNA methylation data obtained in the CD4 lymphocytes, and are then tested for generalizability using the DNA methylation data obtained in the other 11 normal tissues and cell types. Our experiments have shown that (1) an eight-dimensional feature space that is selected via the principal component analysis and that combines all categories of information is effective for predicting the CpG island methylation status, (2) by incorporating the information regarding the nucleosome positioning, gene functions, and histone acetylation, the models can achieve higher specificity and accuracy than the existing models while maintaining a comparable sensitivity measure, (3) the histone modification (methylation and acetylation) information contributes significantly to the prediction, without which the performance of the models deteriorate, and, (4) the predictive models generalize well to different tissues and cell types. The developed program CpGIMethPred is freely available at http://users.ece.gatech.edu/~hzheng7/CGIMetPred.zip.
format Online
Article
Text
id pubmed-3552668
institution National Center for Biotechnology Information
language English
publishDate 2013
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-35526682013-01-28 CpGIMethPred: computational model for predicting methylation status of CpG islands in human genome Zheng, Hao Wu, Hongwei Li, Jinping Jiang, Shi-Wen BMC Med Genomics Research DNA methylation is an inheritable chemical modification of cytosine, and represents one of the most important epigenetic events. Computational prediction of the DNA methylation status can be employed to speed up the genome-wide methylation profiling, and to identify the key features that are correlated with various methylation patterns. Here, we develop CpGIMethPred, the support vector machine-based models to predict the methylation status of the CpG islands in the human genome under normal conditions. The features for prediction include those that have been previously demonstrated effective (CpG island specific attributes, DNA sequence composition patterns, DNA structure patterns, distribution patterns of conserved transcription factor binding sites and conserved elements, and histone methylation status) as well as those that have not been extensively explored but are likely to contribute additional information from a biological point of view (nucleosome positioning propensities, gene functions, and histone acetylation status). Statistical tests are performed to identify the features that are significantly correlated with the methylation status of the CpG islands, and principal component analysis is then performed to decorrelate the selected features. Data from the Human Epigenome Project (HEP) are used to train, validate and test the predictive models. Specifically, the models are trained and validated by using the DNA methylation data obtained in the CD4 lymphocytes, and are then tested for generalizability using the DNA methylation data obtained in the other 11 normal tissues and cell types. Our experiments have shown that (1) an eight-dimensional feature space that is selected via the principal component analysis and that combines all categories of information is effective for predicting the CpG island methylation status, (2) by incorporating the information regarding the nucleosome positioning, gene functions, and histone acetylation, the models can achieve higher specificity and accuracy than the existing models while maintaining a comparable sensitivity measure, (3) the histone modification (methylation and acetylation) information contributes significantly to the prediction, without which the performance of the models deteriorate, and, (4) the predictive models generalize well to different tissues and cell types. The developed program CpGIMethPred is freely available at http://users.ece.gatech.edu/~hzheng7/CGIMetPred.zip. BioMed Central 2013-01-23 /pmc/articles/PMC3552668/ /pubmed/23369266 http://dx.doi.org/10.1186/1755-8794-6-S1-S13 Text en Copyright ©2013 Zheng et al.; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research
Zheng, Hao
Wu, Hongwei
Li, Jinping
Jiang, Shi-Wen
CpGIMethPred: computational model for predicting methylation status of CpG islands in human genome
title CpGIMethPred: computational model for predicting methylation status of CpG islands in human genome
title_full CpGIMethPred: computational model for predicting methylation status of CpG islands in human genome
title_fullStr CpGIMethPred: computational model for predicting methylation status of CpG islands in human genome
title_full_unstemmed CpGIMethPred: computational model for predicting methylation status of CpG islands in human genome
title_short CpGIMethPred: computational model for predicting methylation status of CpG islands in human genome
title_sort cpgimethpred: computational model for predicting methylation status of cpg islands in human genome
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3552668/
https://www.ncbi.nlm.nih.gov/pubmed/23369266
http://dx.doi.org/10.1186/1755-8794-6-S1-S13
work_keys_str_mv AT zhenghao cpgimethpredcomputationalmodelforpredictingmethylationstatusofcpgislandsinhumangenome
AT wuhongwei cpgimethpredcomputationalmodelforpredictingmethylationstatusofcpgislandsinhumangenome
AT lijinping cpgimethpredcomputationalmodelforpredictingmethylationstatusofcpgislandsinhumangenome
AT jiangshiwen cpgimethpredcomputationalmodelforpredictingmethylationstatusofcpgislandsinhumangenome