Cargando…
NmSEER V2.0: a prediction tool for 2′-O-methylation sites based on random forest and multi-encoding combination
BACKGROUND: 2′-O-methylation (2′-O-me or Nm) is a post-transcriptional RNA methylation modified at 2′-hydroxy, which is common in mRNAs and various non-coding RNAs. Previous studies revealed the significance of Nm in multiple biological processes. With Nm getting more and more attention, a revolutio...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6929462/ https://www.ncbi.nlm.nih.gov/pubmed/31874624 http://dx.doi.org/10.1186/s12859-019-3265-8 |
_version_ | 1783482706038882304 |
---|---|
author | Zhou, Yiran Cui, Qinghua Zhou, Yuan |
author_facet | Zhou, Yiran Cui, Qinghua Zhou, Yuan |
author_sort | Zhou, Yiran |
collection | PubMed |
description | BACKGROUND: 2′-O-methylation (2′-O-me or Nm) is a post-transcriptional RNA methylation modified at 2′-hydroxy, which is common in mRNAs and various non-coding RNAs. Previous studies revealed the significance of Nm in multiple biological processes. With Nm getting more and more attention, a revolutionary technique termed Nm-seq, was developed to profile Nm sites mainly in mRNA with single nucleotide resolution and high sensitivity. In a recent work, supported by the Nm-seq data, we have reported a method in silico for predicting Nm sites, which relies on nucleotide sequence information, and established an online server named NmSEER. More recently, a more confident dataset produced by refined Nm-seq was available. Therefore, in this work, we redesigned the prediction model to achieve a more robust performance on the new data. RESULTS: We redesigned the prediction model from two perspectives, including machine learning algorithm and multi-encoding scheme combination. With optimization by 5-fold cross-validation tests and evaluation by independent test respectively, random forest was selected as the most robust algorithm. Meanwhile, one-hot encoding, together with position-specific dinucleotide sequence profile and K-nucleotide frequency encoding were collectively applied to build the final predictor. CONCLUSIONS: The predictor of updated version, named NmSEER V2.0, achieves an accurate prediction performance (AUROC = 0.862) and has been settled into a brand-new server, which is available at http://www.rnanut.net/nmseer-v2/ for free. |
format | Online Article Text |
id | pubmed-6929462 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-69294622019-12-30 NmSEER V2.0: a prediction tool for 2′-O-methylation sites based on random forest and multi-encoding combination Zhou, Yiran Cui, Qinghua Zhou, Yuan BMC Bioinformatics Research BACKGROUND: 2′-O-methylation (2′-O-me or Nm) is a post-transcriptional RNA methylation modified at 2′-hydroxy, which is common in mRNAs and various non-coding RNAs. Previous studies revealed the significance of Nm in multiple biological processes. With Nm getting more and more attention, a revolutionary technique termed Nm-seq, was developed to profile Nm sites mainly in mRNA with single nucleotide resolution and high sensitivity. In a recent work, supported by the Nm-seq data, we have reported a method in silico for predicting Nm sites, which relies on nucleotide sequence information, and established an online server named NmSEER. More recently, a more confident dataset produced by refined Nm-seq was available. Therefore, in this work, we redesigned the prediction model to achieve a more robust performance on the new data. RESULTS: We redesigned the prediction model from two perspectives, including machine learning algorithm and multi-encoding scheme combination. With optimization by 5-fold cross-validation tests and evaluation by independent test respectively, random forest was selected as the most robust algorithm. Meanwhile, one-hot encoding, together with position-specific dinucleotide sequence profile and K-nucleotide frequency encoding were collectively applied to build the final predictor. CONCLUSIONS: The predictor of updated version, named NmSEER V2.0, achieves an accurate prediction performance (AUROC = 0.862) and has been settled into a brand-new server, which is available at http://www.rnanut.net/nmseer-v2/ for free. BioMed Central 2019-12-24 /pmc/articles/PMC6929462/ /pubmed/31874624 http://dx.doi.org/10.1186/s12859-019-3265-8 Text en © The Author(s). 2019 Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Zhou, Yiran Cui, Qinghua Zhou, Yuan NmSEER V2.0: a prediction tool for 2′-O-methylation sites based on random forest and multi-encoding combination |
title | NmSEER V2.0: a prediction tool for 2′-O-methylation sites based on random forest and multi-encoding combination |
title_full | NmSEER V2.0: a prediction tool for 2′-O-methylation sites based on random forest and multi-encoding combination |
title_fullStr | NmSEER V2.0: a prediction tool for 2′-O-methylation sites based on random forest and multi-encoding combination |
title_full_unstemmed | NmSEER V2.0: a prediction tool for 2′-O-methylation sites based on random forest and multi-encoding combination |
title_short | NmSEER V2.0: a prediction tool for 2′-O-methylation sites based on random forest and multi-encoding combination |
title_sort | nmseer v2.0: a prediction tool for 2′-o-methylation sites based on random forest and multi-encoding combination |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6929462/ https://www.ncbi.nlm.nih.gov/pubmed/31874624 http://dx.doi.org/10.1186/s12859-019-3265-8 |
work_keys_str_mv | AT zhouyiran nmseerv20apredictiontoolfor2omethylationsitesbasedonrandomforestandmultiencodingcombination AT cuiqinghua nmseerv20apredictiontoolfor2omethylationsitesbasedonrandomforestandmultiencodingcombination AT zhouyuan nmseerv20apredictiontoolfor2omethylationsitesbasedonrandomforestandmultiencodingcombination |