Cargando…

Genome-Wide Prediction of DNA Methylation Using DNA Composition and Sequence Complexity in Human

DNA methylation plays a significant role in transcriptional regulation by repressing activity. Change of the DNA methylation level is an important factor affecting the expression of target genes and downstream phenotypes. Because current experimental technologies can only assay a small proportion of...

Descripción completa

Detalles Bibliográficos
Autores principales: Wu, Chengchao, Yao, Shixin, Li, Xinghao, Chen, Chujia, Hu, Xuehai
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5343954/
https://www.ncbi.nlm.nih.gov/pubmed/28212312
http://dx.doi.org/10.3390/ijms18020420
_version_ 1782513462184771584
author Wu, Chengchao
Yao, Shixin
Li, Xinghao
Chen, Chujia
Hu, Xuehai
author_facet Wu, Chengchao
Yao, Shixin
Li, Xinghao
Chen, Chujia
Hu, Xuehai
author_sort Wu, Chengchao
collection PubMed
description DNA methylation plays a significant role in transcriptional regulation by repressing activity. Change of the DNA methylation level is an important factor affecting the expression of target genes and downstream phenotypes. Because current experimental technologies can only assay a small proportion of CpG sites in the human genome, it is urgent to develop reliable computational models for predicting genome-wide DNA methylation. Here, we proposed a novel algorithm that accurately extracted sequence complexity features (seven features) and developed a support-vector-machine-based prediction model with integration of the reported DNA composition features (trinucleotide frequency and GC content, 65 features) by utilizing the methylation profiles of embryonic stem cells in human. The prediction results from 22 human chromosomes with size-varied windows showed that the 600-bp window achieved the best average accuracy of 94.7%. Moreover, comparisons with two existing methods further showed the superiority of our model, and cross-species predictions on mouse data also demonstrated that our model has certain generalization ability. Finally, a statistical test of the experimental data and the predicted data on functional regions annotated by ChromHMM found that six out of 10 regions were consistent, which implies reliable prediction of unassayed CpG sites. Accordingly, we believe that our novel model will be useful and reliable in predicting DNA methylation.
format Online
Article
Text
id pubmed-5343954
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-53439542017-03-16 Genome-Wide Prediction of DNA Methylation Using DNA Composition and Sequence Complexity in Human Wu, Chengchao Yao, Shixin Li, Xinghao Chen, Chujia Hu, Xuehai Int J Mol Sci Article DNA methylation plays a significant role in transcriptional regulation by repressing activity. Change of the DNA methylation level is an important factor affecting the expression of target genes and downstream phenotypes. Because current experimental technologies can only assay a small proportion of CpG sites in the human genome, it is urgent to develop reliable computational models for predicting genome-wide DNA methylation. Here, we proposed a novel algorithm that accurately extracted sequence complexity features (seven features) and developed a support-vector-machine-based prediction model with integration of the reported DNA composition features (trinucleotide frequency and GC content, 65 features) by utilizing the methylation profiles of embryonic stem cells in human. The prediction results from 22 human chromosomes with size-varied windows showed that the 600-bp window achieved the best average accuracy of 94.7%. Moreover, comparisons with two existing methods further showed the superiority of our model, and cross-species predictions on mouse data also demonstrated that our model has certain generalization ability. Finally, a statistical test of the experimental data and the predicted data on functional regions annotated by ChromHMM found that six out of 10 regions were consistent, which implies reliable prediction of unassayed CpG sites. Accordingly, we believe that our novel model will be useful and reliable in predicting DNA methylation. MDPI 2017-02-16 /pmc/articles/PMC5343954/ /pubmed/28212312 http://dx.doi.org/10.3390/ijms18020420 Text en © 2017 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Wu, Chengchao
Yao, Shixin
Li, Xinghao
Chen, Chujia
Hu, Xuehai
Genome-Wide Prediction of DNA Methylation Using DNA Composition and Sequence Complexity in Human
title Genome-Wide Prediction of DNA Methylation Using DNA Composition and Sequence Complexity in Human
title_full Genome-Wide Prediction of DNA Methylation Using DNA Composition and Sequence Complexity in Human
title_fullStr Genome-Wide Prediction of DNA Methylation Using DNA Composition and Sequence Complexity in Human
title_full_unstemmed Genome-Wide Prediction of DNA Methylation Using DNA Composition and Sequence Complexity in Human
title_short Genome-Wide Prediction of DNA Methylation Using DNA Composition and Sequence Complexity in Human
title_sort genome-wide prediction of dna methylation using dna composition and sequence complexity in human
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5343954/
https://www.ncbi.nlm.nih.gov/pubmed/28212312
http://dx.doi.org/10.3390/ijms18020420
work_keys_str_mv AT wuchengchao genomewidepredictionofdnamethylationusingdnacompositionandsequencecomplexityinhuman
AT yaoshixin genomewidepredictionofdnamethylationusingdnacompositionandsequencecomplexityinhuman
AT lixinghao genomewidepredictionofdnamethylationusingdnacompositionandsequencecomplexityinhuman
AT chenchujia genomewidepredictionofdnamethylationusingdnacompositionandsequencecomplexityinhuman
AT huxuehai genomewidepredictionofdnamethylationusingdnacompositionandsequencecomplexityinhuman