Cargando…

i6mA-Vote: Cross-Species Identification of DNA N6-Methyladenine Sites in Plant Genomes Based on Ensemble Learning With Voting

DNA N6-Methyladenine (6mA) is a common epigenetic modification, which plays some significant roles in the growth and development of plants. It is crucial to identify 6mA sites for elucidating the functions of 6mA. In this article, a novel model named i6mA-vote is developed to predict 6mA sites of pl...

Descripción completa

Detalles Bibliográficos
Autores principales: Teng, Zhixia, Zhao, Zhengnan, Li, Yanjuan, Tian, Zhen, Guo, Maozu, Lu, Qianzi, Wang, Guohua
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8882731/
https://www.ncbi.nlm.nih.gov/pubmed/35237293
http://dx.doi.org/10.3389/fpls.2022.845835
_version_ 1784659763863224320
author Teng, Zhixia
Zhao, Zhengnan
Li, Yanjuan
Tian, Zhen
Guo, Maozu
Lu, Qianzi
Wang, Guohua
author_facet Teng, Zhixia
Zhao, Zhengnan
Li, Yanjuan
Tian, Zhen
Guo, Maozu
Lu, Qianzi
Wang, Guohua
author_sort Teng, Zhixia
collection PubMed
description DNA N6-Methyladenine (6mA) is a common epigenetic modification, which plays some significant roles in the growth and development of plants. It is crucial to identify 6mA sites for elucidating the functions of 6mA. In this article, a novel model named i6mA-vote is developed to predict 6mA sites of plants. Firstly, DNA sequences were coded into six feature vectors with diverse strategies based on density, physicochemical properties, and position of nucleotides, respectively. To find the best coding strategy, the feature vectors were compared on several machine learning classifiers. The results suggested that the position of nucleotides has a significant positive effect on 6mA sites identification. Thus, the dinucleotide one-hot strategy which can describe position characteristics of nucleotides well was employed to extract DNA features in our method. Secondly, DNA sequences of Rosaceae were divided into a training dataset and a test dataset randomly. Finally, i6mA-vote was constructed by combining five different base-classifiers under a majority voting strategy and trained on the Rosaceae training dataset. The i6mA-vote was evaluated on the task of predicting 6mA sites from the genome of the Rosaceae, Rice, and Arabidopsis separately. In Rosaceae, the performances of i6mA-vote were 0.955 on accuracy (ACC), 0.909 on Matthew correlation coefficients (MCC), 0.955 on sensitivity (SN), and 0.954 on specificity (SP). Those indicators, in the order of ACC, MCC, SN, SP, were 0.882, 0.774, 0.961, and 0.803 on Rice while they were 0.798, 0.617, 0.666, and 0.929 on Arabidopsis. According to the indicators, our method was effectiveness and better than other concerned methods. The results also illustrated that i6mA-vote does not only well in 6mA sites prediction of intraspecies but also interspecies plants. Moreover, it can be seen that the specificity is distinctly lower than the sensitivity in Rice while it is just the opposite in Arabidopsis. It may be resulted from sequence similarity among Rosaceae, Rice and Arabidopsis.
format Online
Article
Text
id pubmed-8882731
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-88827312022-03-01 i6mA-Vote: Cross-Species Identification of DNA N6-Methyladenine Sites in Plant Genomes Based on Ensemble Learning With Voting Teng, Zhixia Zhao, Zhengnan Li, Yanjuan Tian, Zhen Guo, Maozu Lu, Qianzi Wang, Guohua Front Plant Sci Plant Science DNA N6-Methyladenine (6mA) is a common epigenetic modification, which plays some significant roles in the growth and development of plants. It is crucial to identify 6mA sites for elucidating the functions of 6mA. In this article, a novel model named i6mA-vote is developed to predict 6mA sites of plants. Firstly, DNA sequences were coded into six feature vectors with diverse strategies based on density, physicochemical properties, and position of nucleotides, respectively. To find the best coding strategy, the feature vectors were compared on several machine learning classifiers. The results suggested that the position of nucleotides has a significant positive effect on 6mA sites identification. Thus, the dinucleotide one-hot strategy which can describe position characteristics of nucleotides well was employed to extract DNA features in our method. Secondly, DNA sequences of Rosaceae were divided into a training dataset and a test dataset randomly. Finally, i6mA-vote was constructed by combining five different base-classifiers under a majority voting strategy and trained on the Rosaceae training dataset. The i6mA-vote was evaluated on the task of predicting 6mA sites from the genome of the Rosaceae, Rice, and Arabidopsis separately. In Rosaceae, the performances of i6mA-vote were 0.955 on accuracy (ACC), 0.909 on Matthew correlation coefficients (MCC), 0.955 on sensitivity (SN), and 0.954 on specificity (SP). Those indicators, in the order of ACC, MCC, SN, SP, were 0.882, 0.774, 0.961, and 0.803 on Rice while they were 0.798, 0.617, 0.666, and 0.929 on Arabidopsis. According to the indicators, our method was effectiveness and better than other concerned methods. The results also illustrated that i6mA-vote does not only well in 6mA sites prediction of intraspecies but also interspecies plants. Moreover, it can be seen that the specificity is distinctly lower than the sensitivity in Rice while it is just the opposite in Arabidopsis. It may be resulted from sequence similarity among Rosaceae, Rice and Arabidopsis. Frontiers Media S.A. 2022-02-14 /pmc/articles/PMC8882731/ /pubmed/35237293 http://dx.doi.org/10.3389/fpls.2022.845835 Text en Copyright © 2022 Teng, Zhao, Li, Tian, Guo, Lu and Wang. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Plant Science
Teng, Zhixia
Zhao, Zhengnan
Li, Yanjuan
Tian, Zhen
Guo, Maozu
Lu, Qianzi
Wang, Guohua
i6mA-Vote: Cross-Species Identification of DNA N6-Methyladenine Sites in Plant Genomes Based on Ensemble Learning With Voting
title i6mA-Vote: Cross-Species Identification of DNA N6-Methyladenine Sites in Plant Genomes Based on Ensemble Learning With Voting
title_full i6mA-Vote: Cross-Species Identification of DNA N6-Methyladenine Sites in Plant Genomes Based on Ensemble Learning With Voting
title_fullStr i6mA-Vote: Cross-Species Identification of DNA N6-Methyladenine Sites in Plant Genomes Based on Ensemble Learning With Voting
title_full_unstemmed i6mA-Vote: Cross-Species Identification of DNA N6-Methyladenine Sites in Plant Genomes Based on Ensemble Learning With Voting
title_short i6mA-Vote: Cross-Species Identification of DNA N6-Methyladenine Sites in Plant Genomes Based on Ensemble Learning With Voting
title_sort i6ma-vote: cross-species identification of dna n6-methyladenine sites in plant genomes based on ensemble learning with voting
topic Plant Science
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8882731/
https://www.ncbi.nlm.nih.gov/pubmed/35237293
http://dx.doi.org/10.3389/fpls.2022.845835
work_keys_str_mv AT tengzhixia i6mavotecrossspeciesidentificationofdnan6methyladeninesitesinplantgenomesbasedonensemblelearningwithvoting
AT zhaozhengnan i6mavotecrossspeciesidentificationofdnan6methyladeninesitesinplantgenomesbasedonensemblelearningwithvoting
AT liyanjuan i6mavotecrossspeciesidentificationofdnan6methyladeninesitesinplantgenomesbasedonensemblelearningwithvoting
AT tianzhen i6mavotecrossspeciesidentificationofdnan6methyladeninesitesinplantgenomesbasedonensemblelearningwithvoting
AT guomaozu i6mavotecrossspeciesidentificationofdnan6methyladeninesitesinplantgenomesbasedonensemblelearningwithvoting
AT luqianzi i6mavotecrossspeciesidentificationofdnan6methyladeninesitesinplantgenomesbasedonensemblelearningwithvoting
AT wangguohua i6mavotecrossspeciesidentificationofdnan6methyladeninesitesinplantgenomesbasedonensemblelearningwithvoting