Cargando…

i4mC-EL: Identifying DNA N4-Methylcytosine Sites in the Mouse Genome Using Ensemble Learning

As one of important epigenetic modifications, DNA N4-methylcytosine (4mC) plays a crucial role in controlling gene replication, expression, cell cycle, DNA replication, and differentiation. The accurate identification of 4mC sites is necessary to understand biological functions. In the paper, we use...

Descripción completa

Detalles Bibliográficos
Autores principales: Li, Yanjuan, Zhao, Zhengnan, Teng, Zhixia
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8187051/
https://www.ncbi.nlm.nih.gov/pubmed/34159192
http://dx.doi.org/10.1155/2021/5515342
_version_ 1783705066282156032
author Li, Yanjuan
Zhao, Zhengnan
Teng, Zhixia
author_facet Li, Yanjuan
Zhao, Zhengnan
Teng, Zhixia
author_sort Li, Yanjuan
collection PubMed
description As one of important epigenetic modifications, DNA N4-methylcytosine (4mC) plays a crucial role in controlling gene replication, expression, cell cycle, DNA replication, and differentiation. The accurate identification of 4mC sites is necessary to understand biological functions. In the paper, we use ensemble learning to develop a model named i4mC-EL to identify 4mC sites in the mouse genome. Firstly, a multifeature encoding scheme consisting of Kmer and EIIP was adopted to describe the DNA sequences. Secondly, on the basis of the multifeature encoding scheme, we developed a stacked ensemble model, in which four machine learning algorithms, namely, BayesNet, NaiveBayes, LibSVM, and Voted Perceptron, were utilized to implement an ensemble of base classifiers that produce intermediate results as input of the metaclassifier, Logistic. The experimental results on the independent test dataset demonstrate that the overall rate of predictive accurate of i4mC-EL is 82.19%, which is better than the existing methods. The user-friendly website implementing i4mC-EL can be accessed freely at the following.
format Online
Article
Text
id pubmed-8187051
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Hindawi
record_format MEDLINE/PubMed
spelling pubmed-81870512021-06-21 i4mC-EL: Identifying DNA N4-Methylcytosine Sites in the Mouse Genome Using Ensemble Learning Li, Yanjuan Zhao, Zhengnan Teng, Zhixia Biomed Res Int Research Article As one of important epigenetic modifications, DNA N4-methylcytosine (4mC) plays a crucial role in controlling gene replication, expression, cell cycle, DNA replication, and differentiation. The accurate identification of 4mC sites is necessary to understand biological functions. In the paper, we use ensemble learning to develop a model named i4mC-EL to identify 4mC sites in the mouse genome. Firstly, a multifeature encoding scheme consisting of Kmer and EIIP was adopted to describe the DNA sequences. Secondly, on the basis of the multifeature encoding scheme, we developed a stacked ensemble model, in which four machine learning algorithms, namely, BayesNet, NaiveBayes, LibSVM, and Voted Perceptron, were utilized to implement an ensemble of base classifiers that produce intermediate results as input of the metaclassifier, Logistic. The experimental results on the independent test dataset demonstrate that the overall rate of predictive accurate of i4mC-EL is 82.19%, which is better than the existing methods. The user-friendly website implementing i4mC-EL can be accessed freely at the following. Hindawi 2021-05-29 /pmc/articles/PMC8187051/ /pubmed/34159192 http://dx.doi.org/10.1155/2021/5515342 Text en Copyright © 2021 Yanjuan Li et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Li, Yanjuan
Zhao, Zhengnan
Teng, Zhixia
i4mC-EL: Identifying DNA N4-Methylcytosine Sites in the Mouse Genome Using Ensemble Learning
title i4mC-EL: Identifying DNA N4-Methylcytosine Sites in the Mouse Genome Using Ensemble Learning
title_full i4mC-EL: Identifying DNA N4-Methylcytosine Sites in the Mouse Genome Using Ensemble Learning
title_fullStr i4mC-EL: Identifying DNA N4-Methylcytosine Sites in the Mouse Genome Using Ensemble Learning
title_full_unstemmed i4mC-EL: Identifying DNA N4-Methylcytosine Sites in the Mouse Genome Using Ensemble Learning
title_short i4mC-EL: Identifying DNA N4-Methylcytosine Sites in the Mouse Genome Using Ensemble Learning
title_sort i4mc-el: identifying dna n4-methylcytosine sites in the mouse genome using ensemble learning
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8187051/
https://www.ncbi.nlm.nih.gov/pubmed/34159192
http://dx.doi.org/10.1155/2021/5515342
work_keys_str_mv AT liyanjuan i4mcelidentifyingdnan4methylcytosinesitesinthemousegenomeusingensemblelearning
AT zhaozhengnan i4mcelidentifyingdnan4methylcytosinesitesinthemousegenomeusingensemblelearning
AT tengzhixia i4mcelidentifyingdnan4methylcytosinesitesinthemousegenomeusingensemblelearning