Cargando…
EMDLP: Ensemble multiscale deep learning model for RNA methylation site prediction
BACKGROUND: Recent research recommends that epi-transcriptome regulation through post-transcriptional RNA modifications is essential for all sorts of RNA. Exact identification of RNA modification is vital for understanding their purposes and regulatory mechanisms. However, traditional experimental m...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2022
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9178860/ https://www.ncbi.nlm.nih.gov/pubmed/35676633 http://dx.doi.org/10.1186/s12859-022-04756-1 |
_version_ | 1784723148716900352 |
---|---|
author | Wang, Honglei Liu, Hui Huang, Tao Li, Gangshen Zhang, Lin Sun, Yanjing |
author_facet | Wang, Honglei Liu, Hui Huang, Tao Li, Gangshen Zhang, Lin Sun, Yanjing |
author_sort | Wang, Honglei |
collection | PubMed |
description | BACKGROUND: Recent research recommends that epi-transcriptome regulation through post-transcriptional RNA modifications is essential for all sorts of RNA. Exact identification of RNA modification is vital for understanding their purposes and regulatory mechanisms. However, traditional experimental methods of identifying RNA modification sites are relatively complicated, time-consuming, and laborious. Machine learning approaches have been applied in the procedures of RNA sequence features extraction and classification in a computational way, which may supplement experimental approaches more efficiently. Recently, convolutional neural network (CNN) and long short-term memory (LSTM) have been demonstrated achievements in modification site prediction on account of their powerful functions in representation learning. However, CNN can learn the local response from the spatial data but cannot learn sequential correlations. And LSTM is specialized for sequential modeling and can access both the contextual representation but lacks spatial data extraction compared with CNN. There is strong motivation to construct a prediction framework using natural language processing (NLP), deep learning (DL) for these reasons. RESULTS: This study presents an ensemble multiscale deep learning predictor (EMDLP) to identify RNA methylation sites in an NLP and DL way. It organically combines the dilated convolution and Bidirectional LSTM (BiLSTM), which helps to take better advantage of the local and global information for site prediction. The first step of EMDLP is to represent the RNA sequences in an NLP way. Thus, three encodings, e.g., RNA word embedding, One-hot encoding, and RGloVe, which is an improved learning method of word vector representation based on GloVe, are adopted to decipher sites from the viewpoints of the local and global information. Then, a dilated convolutional Bidirectional LSTM network (DCB) model is constructed with the dilated convolutional neural network (DCNN) followed by BiLSTM to extract potential contributing features for methylation site prediction. Finally, these three encoding methods are integrated by a soft vote to obtain better predictive performance. Experiment results on m(1)A and m(6)A reveal that the area under the receiver operating characteristic(AUROC) of EMDLP obtains respectively 95.56%, 85.24%, and outperforms the state-of-the-art models. To maximize user convenience, a user-friendly webserver for EMDLP was publicly available at http://www.labiip.net/EMDLP/index.php (http://47.104.130.81/EMDLP/index.php). CONCLUSIONS: We developed a predictor for m(1)A and m(6)A methylation sites. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04756-1. |
format | Online Article Text |
id | pubmed-9178860 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2022 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-91788602022-06-10 EMDLP: Ensemble multiscale deep learning model for RNA methylation site prediction Wang, Honglei Liu, Hui Huang, Tao Li, Gangshen Zhang, Lin Sun, Yanjing BMC Bioinformatics Research BACKGROUND: Recent research recommends that epi-transcriptome regulation through post-transcriptional RNA modifications is essential for all sorts of RNA. Exact identification of RNA modification is vital for understanding their purposes and regulatory mechanisms. However, traditional experimental methods of identifying RNA modification sites are relatively complicated, time-consuming, and laborious. Machine learning approaches have been applied in the procedures of RNA sequence features extraction and classification in a computational way, which may supplement experimental approaches more efficiently. Recently, convolutional neural network (CNN) and long short-term memory (LSTM) have been demonstrated achievements in modification site prediction on account of their powerful functions in representation learning. However, CNN can learn the local response from the spatial data but cannot learn sequential correlations. And LSTM is specialized for sequential modeling and can access both the contextual representation but lacks spatial data extraction compared with CNN. There is strong motivation to construct a prediction framework using natural language processing (NLP), deep learning (DL) for these reasons. RESULTS: This study presents an ensemble multiscale deep learning predictor (EMDLP) to identify RNA methylation sites in an NLP and DL way. It organically combines the dilated convolution and Bidirectional LSTM (BiLSTM), which helps to take better advantage of the local and global information for site prediction. The first step of EMDLP is to represent the RNA sequences in an NLP way. Thus, three encodings, e.g., RNA word embedding, One-hot encoding, and RGloVe, which is an improved learning method of word vector representation based on GloVe, are adopted to decipher sites from the viewpoints of the local and global information. Then, a dilated convolutional Bidirectional LSTM network (DCB) model is constructed with the dilated convolutional neural network (DCNN) followed by BiLSTM to extract potential contributing features for methylation site prediction. Finally, these three encoding methods are integrated by a soft vote to obtain better predictive performance. Experiment results on m(1)A and m(6)A reveal that the area under the receiver operating characteristic(AUROC) of EMDLP obtains respectively 95.56%, 85.24%, and outperforms the state-of-the-art models. To maximize user convenience, a user-friendly webserver for EMDLP was publicly available at http://www.labiip.net/EMDLP/index.php (http://47.104.130.81/EMDLP/index.php). CONCLUSIONS: We developed a predictor for m(1)A and m(6)A methylation sites. SUPPLEMENTARY INFORMATION: The online version contains supplementary material available at 10.1186/s12859-022-04756-1. BioMed Central 2022-06-08 /pmc/articles/PMC9178860/ /pubmed/35676633 http://dx.doi.org/10.1186/s12859-022-04756-1 Text en © The Author(s) 2022 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data. |
spellingShingle | Research Wang, Honglei Liu, Hui Huang, Tao Li, Gangshen Zhang, Lin Sun, Yanjing EMDLP: Ensemble multiscale deep learning model for RNA methylation site prediction |
title | EMDLP: Ensemble multiscale deep learning model for RNA methylation site prediction |
title_full | EMDLP: Ensemble multiscale deep learning model for RNA methylation site prediction |
title_fullStr | EMDLP: Ensemble multiscale deep learning model for RNA methylation site prediction |
title_full_unstemmed | EMDLP: Ensemble multiscale deep learning model for RNA methylation site prediction |
title_short | EMDLP: Ensemble multiscale deep learning model for RNA methylation site prediction |
title_sort | emdlp: ensemble multiscale deep learning model for rna methylation site prediction |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9178860/ https://www.ncbi.nlm.nih.gov/pubmed/35676633 http://dx.doi.org/10.1186/s12859-022-04756-1 |
work_keys_str_mv | AT wanghonglei emdlpensemblemultiscaledeeplearningmodelforrnamethylationsiteprediction AT liuhui emdlpensemblemultiscaledeeplearningmodelforrnamethylationsiteprediction AT huangtao emdlpensemblemultiscaledeeplearningmodelforrnamethylationsiteprediction AT ligangshen emdlpensemblemultiscaledeeplearningmodelforrnamethylationsiteprediction AT zhanglin emdlpensemblemultiscaledeeplearningmodelforrnamethylationsiteprediction AT sunyanjing emdlpensemblemultiscaledeeplearningmodelforrnamethylationsiteprediction |