Cargando…

4mCPred-CNN—Prediction of DNA N4-Methylcytosine in the Mouse Genome Using a Convolutional Neural Network

Among DNA modifications, N4-methylcytosine (4mC) is one of the most significant ones, and it is linked to the development of cell proliferation and gene expression. To know different its biological functions, the accurate detection of 4mC sites is required. Although we have several techniques for th...

Descripción completa

Detalles Bibliográficos
Autores principales: Abbas, Zeeshan, Tayara, Hilal, Chong, Kil To
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7924022/
https://www.ncbi.nlm.nih.gov/pubmed/33672576
http://dx.doi.org/10.3390/genes12020296
_version_ 1783659003123859456
author Abbas, Zeeshan
Tayara, Hilal
Chong, Kil To
author_facet Abbas, Zeeshan
Tayara, Hilal
Chong, Kil To
author_sort Abbas, Zeeshan
collection PubMed
description Among DNA modifications, N4-methylcytosine (4mC) is one of the most significant ones, and it is linked to the development of cell proliferation and gene expression. To know different its biological functions, the accurate detection of 4mC sites is required. Although we have several techniques for the prediction of 4mC sites in different genomes based on both machine learning (ML) and convolutional neural networks (CNNs), there is no CNN-based tool for the identification of 4mC sites in the mouse genome. In this article, a CNN-based model named 4mCPred-CNN was developed to classify 4mC locations in the mouse genome. Until now, we had only two ML-based models for this purpose; they utilized several feature encoding schemes, and thus still had a lot of space available to improve the prediction accuracy. Utilizing only a single feature encoding scheme—one-hot encoding—we outperformed both of the previous ML-based techniques. In a ten-fold validation test, the proposed model, 4mCPred-CNN, achieved an accuracy of 85.71% and Matthews correlation coefficient (MCC) of 0.717. On an independent dataset, the achieved accuracy was 87.50% with an MCC value of 0.750. The attained results exhibit that the proposed model can be of great use for researchers in the fields of biology and bioinformatics.
format Online
Article
Text
id pubmed-7924022
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-79240222021-03-03 4mCPred-CNN—Prediction of DNA N4-Methylcytosine in the Mouse Genome Using a Convolutional Neural Network Abbas, Zeeshan Tayara, Hilal Chong, Kil To Genes (Basel) Article Among DNA modifications, N4-methylcytosine (4mC) is one of the most significant ones, and it is linked to the development of cell proliferation and gene expression. To know different its biological functions, the accurate detection of 4mC sites is required. Although we have several techniques for the prediction of 4mC sites in different genomes based on both machine learning (ML) and convolutional neural networks (CNNs), there is no CNN-based tool for the identification of 4mC sites in the mouse genome. In this article, a CNN-based model named 4mCPred-CNN was developed to classify 4mC locations in the mouse genome. Until now, we had only two ML-based models for this purpose; they utilized several feature encoding schemes, and thus still had a lot of space available to improve the prediction accuracy. Utilizing only a single feature encoding scheme—one-hot encoding—we outperformed both of the previous ML-based techniques. In a ten-fold validation test, the proposed model, 4mCPred-CNN, achieved an accuracy of 85.71% and Matthews correlation coefficient (MCC) of 0.717. On an independent dataset, the achieved accuracy was 87.50% with an MCC value of 0.750. The attained results exhibit that the proposed model can be of great use for researchers in the fields of biology and bioinformatics. MDPI 2021-02-20 /pmc/articles/PMC7924022/ /pubmed/33672576 http://dx.doi.org/10.3390/genes12020296 Text en © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Abbas, Zeeshan
Tayara, Hilal
Chong, Kil To
4mCPred-CNN—Prediction of DNA N4-Methylcytosine in the Mouse Genome Using a Convolutional Neural Network
title 4mCPred-CNN—Prediction of DNA N4-Methylcytosine in the Mouse Genome Using a Convolutional Neural Network
title_full 4mCPred-CNN—Prediction of DNA N4-Methylcytosine in the Mouse Genome Using a Convolutional Neural Network
title_fullStr 4mCPred-CNN—Prediction of DNA N4-Methylcytosine in the Mouse Genome Using a Convolutional Neural Network
title_full_unstemmed 4mCPred-CNN—Prediction of DNA N4-Methylcytosine in the Mouse Genome Using a Convolutional Neural Network
title_short 4mCPred-CNN—Prediction of DNA N4-Methylcytosine in the Mouse Genome Using a Convolutional Neural Network
title_sort 4mcpred-cnn—prediction of dna n4-methylcytosine in the mouse genome using a convolutional neural network
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7924022/
https://www.ncbi.nlm.nih.gov/pubmed/33672576
http://dx.doi.org/10.3390/genes12020296
work_keys_str_mv AT abbaszeeshan 4mcpredcnnpredictionofdnan4methylcytosineinthemousegenomeusingaconvolutionalneuralnetwork
AT tayarahilal 4mcpredcnnpredictionofdnan4methylcytosineinthemousegenomeusingaconvolutionalneuralnetwork
AT chongkilto 4mcpredcnnpredictionofdnan4methylcytosineinthemousegenomeusingaconvolutionalneuralnetwork