Cargando…

SDM6A: A Web-Based Integrative Machine-Learning Framework for Predicting 6mA Sites in the Rice Genome

DNA N(6)-adenine methylation (6mA) is an epigenetic modification in prokaryotes and eukaryotes. Identifying 6mA sites in rice genome is important in rice epigenetics and breeding, but non-random distribution and biological functions of these sites remain unclear. Several machine-learning tools can i...

Descripción completa

Detalles Bibliográficos
Autores principales: Basith, Shaherin, Manavalan, Balachandran, Shin, Tae Hwan, Lee, Gwang
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Society of Gene & Cell Therapy 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6796762/
https://www.ncbi.nlm.nih.gov/pubmed/31542696
http://dx.doi.org/10.1016/j.omtn.2019.08.011
_version_ 1783459682370715648
author Basith, Shaherin
Manavalan, Balachandran
Shin, Tae Hwan
Lee, Gwang
author_facet Basith, Shaherin
Manavalan, Balachandran
Shin, Tae Hwan
Lee, Gwang
author_sort Basith, Shaherin
collection PubMed
description DNA N(6)-adenine methylation (6mA) is an epigenetic modification in prokaryotes and eukaryotes. Identifying 6mA sites in rice genome is important in rice epigenetics and breeding, but non-random distribution and biological functions of these sites remain unclear. Several machine-learning tools can identify 6mA sites but show limited prediction accuracy, which limits their usability in epigenetic research. Here, we developed a novel computational predictor, called the Sequence-based DNA N(6)-methyladenine predictor (SDM6A), which is a two-layer ensemble approach for identifying 6mA sites in the rice genome. Unlike existing methods, which are based on single models with basic features, SDM6A explores various features, and five encoding methods were identified as appropriate for this problem. Subsequently, an optimal feature set was identified from encodings, and corresponding models were developed individually using support vector machine and extremely randomized tree. First, all five single models were integrated via ensemble approach to define the class for each classifier. Second, two classifiers were integrated to generate a final prediction. SDM6A achieved robust performance on cross-validation and independent evaluation, with average accuracy and Matthews correlation coefficient (MCC) of 88.2% and 0.764, respectively. Corresponding metrics were 4.7%–11.0% and 2.3%–5.5% higher than those of existing methods, respectively. A user-friendly, publicly accessible web server (http://thegleelab.org/SDM6A) was implemented to predict novel putative 6mA sites in rice genome.
format Online
Article
Text
id pubmed-6796762
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher American Society of Gene & Cell Therapy
record_format MEDLINE/PubMed
spelling pubmed-67967622019-10-22 SDM6A: A Web-Based Integrative Machine-Learning Framework for Predicting 6mA Sites in the Rice Genome Basith, Shaherin Manavalan, Balachandran Shin, Tae Hwan Lee, Gwang Mol Ther Nucleic Acids Article DNA N(6)-adenine methylation (6mA) is an epigenetic modification in prokaryotes and eukaryotes. Identifying 6mA sites in rice genome is important in rice epigenetics and breeding, but non-random distribution and biological functions of these sites remain unclear. Several machine-learning tools can identify 6mA sites but show limited prediction accuracy, which limits their usability in epigenetic research. Here, we developed a novel computational predictor, called the Sequence-based DNA N(6)-methyladenine predictor (SDM6A), which is a two-layer ensemble approach for identifying 6mA sites in the rice genome. Unlike existing methods, which are based on single models with basic features, SDM6A explores various features, and five encoding methods were identified as appropriate for this problem. Subsequently, an optimal feature set was identified from encodings, and corresponding models were developed individually using support vector machine and extremely randomized tree. First, all five single models were integrated via ensemble approach to define the class for each classifier. Second, two classifiers were integrated to generate a final prediction. SDM6A achieved robust performance on cross-validation and independent evaluation, with average accuracy and Matthews correlation coefficient (MCC) of 88.2% and 0.764, respectively. Corresponding metrics were 4.7%–11.0% and 2.3%–5.5% higher than those of existing methods, respectively. A user-friendly, publicly accessible web server (http://thegleelab.org/SDM6A) was implemented to predict novel putative 6mA sites in rice genome. American Society of Gene & Cell Therapy 2019-08-16 /pmc/articles/PMC6796762/ /pubmed/31542696 http://dx.doi.org/10.1016/j.omtn.2019.08.011 Text en © 2019 The Authors http://creativecommons.org/licenses/by/4.0/ This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Basith, Shaherin
Manavalan, Balachandran
Shin, Tae Hwan
Lee, Gwang
SDM6A: A Web-Based Integrative Machine-Learning Framework for Predicting 6mA Sites in the Rice Genome
title SDM6A: A Web-Based Integrative Machine-Learning Framework for Predicting 6mA Sites in the Rice Genome
title_full SDM6A: A Web-Based Integrative Machine-Learning Framework for Predicting 6mA Sites in the Rice Genome
title_fullStr SDM6A: A Web-Based Integrative Machine-Learning Framework for Predicting 6mA Sites in the Rice Genome
title_full_unstemmed SDM6A: A Web-Based Integrative Machine-Learning Framework for Predicting 6mA Sites in the Rice Genome
title_short SDM6A: A Web-Based Integrative Machine-Learning Framework for Predicting 6mA Sites in the Rice Genome
title_sort sdm6a: a web-based integrative machine-learning framework for predicting 6ma sites in the rice genome
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6796762/
https://www.ncbi.nlm.nih.gov/pubmed/31542696
http://dx.doi.org/10.1016/j.omtn.2019.08.011
work_keys_str_mv AT basithshaherin sdm6aawebbasedintegrativemachinelearningframeworkforpredicting6masitesinthericegenome
AT manavalanbalachandran sdm6aawebbasedintegrativemachinelearningframeworkforpredicting6masitesinthericegenome
AT shintaehwan sdm6aawebbasedintegrativemachinelearningframeworkforpredicting6masitesinthericegenome
AT leegwang sdm6aawebbasedintegrativemachinelearningframeworkforpredicting6masitesinthericegenome