Cargando…

4mCpred-EL: An Ensemble Learning Framework for Identification of DNA N(4)-Methylcytosine Sites in the Mouse Genome

DNA N(4)-methylcytosine (4mC) is one of the key epigenetic alterations, playing essential roles in DNA replication, differentiation, cell cycle, and gene expression. To better understand 4mC biological functions, it is crucial to gain knowledge on its genomic distribution. In recent times, few compu...

Descripción completa

Detalles Bibliográficos
Autores principales: Manavalan, Balachandran, Basith, Shaherin, Shin, Tae Hwan, Lee, Da Yeon, Wei, Leyi, Lee, Gwang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6912380/
https://www.ncbi.nlm.nih.gov/pubmed/31661923
http://dx.doi.org/10.3390/cells8111332
_version_ 1783479442341888000
author Manavalan, Balachandran
Basith, Shaherin
Shin, Tae Hwan
Lee, Da Yeon
Wei, Leyi
Lee, Gwang
author_facet Manavalan, Balachandran
Basith, Shaherin
Shin, Tae Hwan
Lee, Da Yeon
Wei, Leyi
Lee, Gwang
author_sort Manavalan, Balachandran
collection PubMed
description DNA N(4)-methylcytosine (4mC) is one of the key epigenetic alterations, playing essential roles in DNA replication, differentiation, cell cycle, and gene expression. To better understand 4mC biological functions, it is crucial to gain knowledge on its genomic distribution. In recent times, few computational studies, in particular machine learning (ML) approaches have been applied in the prediction of 4mC site predictions. Although ML-based methods are promising for 4mC identification in other species, none are available for detecting 4mCs in the mouse genome. Our novel computational approach, called 4mCpred-EL, is the first method for identifying 4mC sites in the mouse genome where four different ML algorithms with a wide range of seven feature encodings are utilized. Subsequently, those feature encodings predicted probabilistic values are used as a feature vector and are once again inputted to ML algorithms, whose corresponding models are integrated into ensemble learning. Our benchmarking results demonstrated that 4mCpred-EL achieved an accuracy and MCC values of 0.795 and 0.591, which significantly outperformed seven other classifiers by more than 1.5–5.9% and 3.2–11.7%, respectively. Additionally, 4mCpred-EL attained an overall accuracy of 79.80%, which is 1.8–5.1% higher than that yielded by seven other classifiers in the independent evaluation. We provided a user-friendly web server, namely 4mCpred-EL which could be implemented as a pre-screening tool for the identification of potential 4mC sites in the mouse genome.
format Online
Article
Text
id pubmed-6912380
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-69123802020-01-02 4mCpred-EL: An Ensemble Learning Framework for Identification of DNA N(4)-Methylcytosine Sites in the Mouse Genome Manavalan, Balachandran Basith, Shaherin Shin, Tae Hwan Lee, Da Yeon Wei, Leyi Lee, Gwang Cells Article DNA N(4)-methylcytosine (4mC) is one of the key epigenetic alterations, playing essential roles in DNA replication, differentiation, cell cycle, and gene expression. To better understand 4mC biological functions, it is crucial to gain knowledge on its genomic distribution. In recent times, few computational studies, in particular machine learning (ML) approaches have been applied in the prediction of 4mC site predictions. Although ML-based methods are promising for 4mC identification in other species, none are available for detecting 4mCs in the mouse genome. Our novel computational approach, called 4mCpred-EL, is the first method for identifying 4mC sites in the mouse genome where four different ML algorithms with a wide range of seven feature encodings are utilized. Subsequently, those feature encodings predicted probabilistic values are used as a feature vector and are once again inputted to ML algorithms, whose corresponding models are integrated into ensemble learning. Our benchmarking results demonstrated that 4mCpred-EL achieved an accuracy and MCC values of 0.795 and 0.591, which significantly outperformed seven other classifiers by more than 1.5–5.9% and 3.2–11.7%, respectively. Additionally, 4mCpred-EL attained an overall accuracy of 79.80%, which is 1.8–5.1% higher than that yielded by seven other classifiers in the independent evaluation. We provided a user-friendly web server, namely 4mCpred-EL which could be implemented as a pre-screening tool for the identification of potential 4mC sites in the mouse genome. MDPI 2019-10-28 /pmc/articles/PMC6912380/ /pubmed/31661923 http://dx.doi.org/10.3390/cells8111332 Text en © 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Manavalan, Balachandran
Basith, Shaherin
Shin, Tae Hwan
Lee, Da Yeon
Wei, Leyi
Lee, Gwang
4mCpred-EL: An Ensemble Learning Framework for Identification of DNA N(4)-Methylcytosine Sites in the Mouse Genome
title 4mCpred-EL: An Ensemble Learning Framework for Identification of DNA N(4)-Methylcytosine Sites in the Mouse Genome
title_full 4mCpred-EL: An Ensemble Learning Framework for Identification of DNA N(4)-Methylcytosine Sites in the Mouse Genome
title_fullStr 4mCpred-EL: An Ensemble Learning Framework for Identification of DNA N(4)-Methylcytosine Sites in the Mouse Genome
title_full_unstemmed 4mCpred-EL: An Ensemble Learning Framework for Identification of DNA N(4)-Methylcytosine Sites in the Mouse Genome
title_short 4mCpred-EL: An Ensemble Learning Framework for Identification of DNA N(4)-Methylcytosine Sites in the Mouse Genome
title_sort 4mcpred-el: an ensemble learning framework for identification of dna n(4)-methylcytosine sites in the mouse genome
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6912380/
https://www.ncbi.nlm.nih.gov/pubmed/31661923
http://dx.doi.org/10.3390/cells8111332
work_keys_str_mv AT manavalanbalachandran 4mcpredelanensemblelearningframeworkforidentificationofdnan4methylcytosinesitesinthemousegenome
AT basithshaherin 4mcpredelanensemblelearningframeworkforidentificationofdnan4methylcytosinesitesinthemousegenome
AT shintaehwan 4mcpredelanensemblelearningframeworkforidentificationofdnan4methylcytosinesitesinthemousegenome
AT leedayeon 4mcpredelanensemblelearningframeworkforidentificationofdnan4methylcytosinesitesinthemousegenome
AT weileyi 4mcpredelanensemblelearningframeworkforidentificationofdnan4methylcytosinesitesinthemousegenome
AT leegwang 4mcpredelanensemblelearningframeworkforidentificationofdnan4methylcytosinesitesinthemousegenome