Cargando…

A Novel Computational Method for Detecting DNA Methylation Sites with DNA Sequence Information and Physicochemical Properties

DNA methylation is an important biochemical process, and it has a close connection with many types of cancer. Research about DNA methylation can help us to understand the regulation mechanism and epigenetic reprogramming. Therefore, it becomes very important to recognize the methylation sites in the...

Descripción completa

Detalles Bibliográficos
Autores principales: Pan, Gaofeng, Jiang, Limin, Tang, Jijun, Guo, Fei
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5855733/
https://www.ncbi.nlm.nih.gov/pubmed/29419752
http://dx.doi.org/10.3390/ijms19020511
_version_ 1783307166016339968
author Pan, Gaofeng
Jiang, Limin
Tang, Jijun
Guo, Fei
author_facet Pan, Gaofeng
Jiang, Limin
Tang, Jijun
Guo, Fei
author_sort Pan, Gaofeng
collection PubMed
description DNA methylation is an important biochemical process, and it has a close connection with many types of cancer. Research about DNA methylation can help us to understand the regulation mechanism and epigenetic reprogramming. Therefore, it becomes very important to recognize the methylation sites in the DNA sequence. In the past several decades, many computational methods—especially machine learning methods—have been developed since the high-throughout sequencing technology became widely used in research and industry. In order to accurately identify whether or not a nucleotide residue is methylated under the specific DNA sequence context, we propose a novel method that overcomes the shortcomings of previous methods for predicting methylation sites. We use k-gram, multivariate mutual information, discrete wavelet transform, and pseudo amino acid composition to extract features, and train a sparse Bayesian learning model to do DNA methylation prediction. Five criteria—area under the receiver operating characteristic curve (AUC), Matthew’s correlation coefficient (MCC), accuracy (ACC), sensitivity (SN), and specificity—are used to evaluate the prediction results of our method. On the benchmark dataset, we could reach [Formula: see text] on AUC, [Formula: see text] on ACC, [Formula: see text] on MCC, and [Formula: see text] on SN. Additionally, the best results on two scBS-seq profiled mouse embryonic stem cells datasets were [Formula: see text] and [Formula: see text] by AUC, respectively. When compared with other outstanding methods, our method surpassed them on the accuracy of prediction. The improvement of AUC by our method compared to other methods was at least [Formula: see text]. For the convenience of other researchers, our code has been uploaded to a file hosting service, and can be downloaded from: https://figshare.com/s/0697b692d802861282d3.
format Online
Article
Text
id pubmed-5855733
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-58557332018-03-20 A Novel Computational Method for Detecting DNA Methylation Sites with DNA Sequence Information and Physicochemical Properties Pan, Gaofeng Jiang, Limin Tang, Jijun Guo, Fei Int J Mol Sci Article DNA methylation is an important biochemical process, and it has a close connection with many types of cancer. Research about DNA methylation can help us to understand the regulation mechanism and epigenetic reprogramming. Therefore, it becomes very important to recognize the methylation sites in the DNA sequence. In the past several decades, many computational methods—especially machine learning methods—have been developed since the high-throughout sequencing technology became widely used in research and industry. In order to accurately identify whether or not a nucleotide residue is methylated under the specific DNA sequence context, we propose a novel method that overcomes the shortcomings of previous methods for predicting methylation sites. We use k-gram, multivariate mutual information, discrete wavelet transform, and pseudo amino acid composition to extract features, and train a sparse Bayesian learning model to do DNA methylation prediction. Five criteria—area under the receiver operating characteristic curve (AUC), Matthew’s correlation coefficient (MCC), accuracy (ACC), sensitivity (SN), and specificity—are used to evaluate the prediction results of our method. On the benchmark dataset, we could reach [Formula: see text] on AUC, [Formula: see text] on ACC, [Formula: see text] on MCC, and [Formula: see text] on SN. Additionally, the best results on two scBS-seq profiled mouse embryonic stem cells datasets were [Formula: see text] and [Formula: see text] by AUC, respectively. When compared with other outstanding methods, our method surpassed them on the accuracy of prediction. The improvement of AUC by our method compared to other methods was at least [Formula: see text]. For the convenience of other researchers, our code has been uploaded to a file hosting service, and can be downloaded from: https://figshare.com/s/0697b692d802861282d3. MDPI 2018-02-08 /pmc/articles/PMC5855733/ /pubmed/29419752 http://dx.doi.org/10.3390/ijms19020511 Text en © 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Pan, Gaofeng
Jiang, Limin
Tang, Jijun
Guo, Fei
A Novel Computational Method for Detecting DNA Methylation Sites with DNA Sequence Information and Physicochemical Properties
title A Novel Computational Method for Detecting DNA Methylation Sites with DNA Sequence Information and Physicochemical Properties
title_full A Novel Computational Method for Detecting DNA Methylation Sites with DNA Sequence Information and Physicochemical Properties
title_fullStr A Novel Computational Method for Detecting DNA Methylation Sites with DNA Sequence Information and Physicochemical Properties
title_full_unstemmed A Novel Computational Method for Detecting DNA Methylation Sites with DNA Sequence Information and Physicochemical Properties
title_short A Novel Computational Method for Detecting DNA Methylation Sites with DNA Sequence Information and Physicochemical Properties
title_sort novel computational method for detecting dna methylation sites with dna sequence information and physicochemical properties
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5855733/
https://www.ncbi.nlm.nih.gov/pubmed/29419752
http://dx.doi.org/10.3390/ijms19020511
work_keys_str_mv AT pangaofeng anovelcomputationalmethodfordetectingdnamethylationsiteswithdnasequenceinformationandphysicochemicalproperties
AT jianglimin anovelcomputationalmethodfordetectingdnamethylationsiteswithdnasequenceinformationandphysicochemicalproperties
AT tangjijun anovelcomputationalmethodfordetectingdnamethylationsiteswithdnasequenceinformationandphysicochemicalproperties
AT guofei anovelcomputationalmethodfordetectingdnamethylationsiteswithdnasequenceinformationandphysicochemicalproperties
AT pangaofeng novelcomputationalmethodfordetectingdnamethylationsiteswithdnasequenceinformationandphysicochemicalproperties
AT jianglimin novelcomputationalmethodfordetectingdnamethylationsiteswithdnasequenceinformationandphysicochemicalproperties
AT tangjijun novelcomputationalmethodfordetectingdnamethylationsiteswithdnasequenceinformationandphysicochemicalproperties
AT guofei novelcomputationalmethodfordetectingdnamethylationsiteswithdnasequenceinformationandphysicochemicalproperties