Cargando…

EDLm(6)APred: ensemble deep learning approach for mRNA m(6)A site prediction

BACKGROUND: As a common and abundant RNA methylation modification, N6-methyladenosine (m(6)A) is widely spread in various species' transcriptomes, and it is closely related to the occurrence and development of various life processes and diseases. Thus, accurate identification of m(6)A methylati...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Lin, Li, Gangshen, Li, Xiuyu, Wang, Honglei, Chen, Shutao, Liu, Hui
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8164815/
https://www.ncbi.nlm.nih.gov/pubmed/34051729
http://dx.doi.org/10.1186/s12859-021-04206-4
_version_ 1783701196592119808
author Zhang, Lin
Li, Gangshen
Li, Xiuyu
Wang, Honglei
Chen, Shutao
Liu, Hui
author_facet Zhang, Lin
Li, Gangshen
Li, Xiuyu
Wang, Honglei
Chen, Shutao
Liu, Hui
author_sort Zhang, Lin
collection PubMed
description BACKGROUND: As a common and abundant RNA methylation modification, N6-methyladenosine (m(6)A) is widely spread in various species' transcriptomes, and it is closely related to the occurrence and development of various life processes and diseases. Thus, accurate identification of m(6)A methylation sites has become a hot topic. Most biological methods rely on high-throughput sequencing technology, which places great demands on the sequencing library preparation and data analysis. Thus, various machine learning methods have been proposed to extract various types of features based on sequences, then occupied conventional classifiers, such as SVM, RF, etc., for m(6)A methylation site identification. However, the identification performance relies heavily on the extracted features, which still need to be improved. RESULTS: This paper mainly studies feature extraction and classification of m(6)A methylation sites in a natural language processing way, which manages to organically integrate the feature extraction and classification simultaneously, with consideration of upstream and downstream information of m(6)A sites. One-hot, RNA word embedding, and Word2vec are adopted to depict sites from the perspectives of the base as well as its upstream and downstream sequence. The BiLSTM model, a well-known sequence model, was then constructed to discriminate the sequences with potential m(6)A sites. Since the above-mentioned three feature extraction methods focus on different perspectives of m(6)A sites, an ensemble deep learning predictor (EDLm(6)APred) was finally constructed for m(6)A site prediction. Experimental results on human and mouse data sets show that EDLm(6)APred outperforms the other single ones, indicating that base, upstream, and downstream information are all essential for m(6)A site detection. Compared with the existing m(6)A methylation site prediction models without genomic features, EDLm(6)APred obtains 86.6% of the area under receiver operating curve on the human data sets, indicating the effectiveness of sequential modeling on RNA. To maximize user convenience, a webserver was developed as an implementation of EDLm(6)APred and made publicly available at www.xjtlu.edu.cn/biologicalsciences/EDLm6APred. CONCLUSIONS: Our proposed EDLm(6)APred method is a reliable predictor for m(6)A methylation sites.
format Online
Article
Text
id pubmed-8164815
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-81648152021-06-01 EDLm(6)APred: ensemble deep learning approach for mRNA m(6)A site prediction Zhang, Lin Li, Gangshen Li, Xiuyu Wang, Honglei Chen, Shutao Liu, Hui BMC Bioinformatics Methodology Article BACKGROUND: As a common and abundant RNA methylation modification, N6-methyladenosine (m(6)A) is widely spread in various species' transcriptomes, and it is closely related to the occurrence and development of various life processes and diseases. Thus, accurate identification of m(6)A methylation sites has become a hot topic. Most biological methods rely on high-throughput sequencing technology, which places great demands on the sequencing library preparation and data analysis. Thus, various machine learning methods have been proposed to extract various types of features based on sequences, then occupied conventional classifiers, such as SVM, RF, etc., for m(6)A methylation site identification. However, the identification performance relies heavily on the extracted features, which still need to be improved. RESULTS: This paper mainly studies feature extraction and classification of m(6)A methylation sites in a natural language processing way, which manages to organically integrate the feature extraction and classification simultaneously, with consideration of upstream and downstream information of m(6)A sites. One-hot, RNA word embedding, and Word2vec are adopted to depict sites from the perspectives of the base as well as its upstream and downstream sequence. The BiLSTM model, a well-known sequence model, was then constructed to discriminate the sequences with potential m(6)A sites. Since the above-mentioned three feature extraction methods focus on different perspectives of m(6)A sites, an ensemble deep learning predictor (EDLm(6)APred) was finally constructed for m(6)A site prediction. Experimental results on human and mouse data sets show that EDLm(6)APred outperforms the other single ones, indicating that base, upstream, and downstream information are all essential for m(6)A site detection. Compared with the existing m(6)A methylation site prediction models without genomic features, EDLm(6)APred obtains 86.6% of the area under receiver operating curve on the human data sets, indicating the effectiveness of sequential modeling on RNA. To maximize user convenience, a webserver was developed as an implementation of EDLm(6)APred and made publicly available at www.xjtlu.edu.cn/biologicalsciences/EDLm6APred. CONCLUSIONS: Our proposed EDLm(6)APred method is a reliable predictor for m(6)A methylation sites. BioMed Central 2021-05-29 /pmc/articles/PMC8164815/ /pubmed/34051729 http://dx.doi.org/10.1186/s12859-021-04206-4 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Methodology Article
Zhang, Lin
Li, Gangshen
Li, Xiuyu
Wang, Honglei
Chen, Shutao
Liu, Hui
EDLm(6)APred: ensemble deep learning approach for mRNA m(6)A site prediction
title EDLm(6)APred: ensemble deep learning approach for mRNA m(6)A site prediction
title_full EDLm(6)APred: ensemble deep learning approach for mRNA m(6)A site prediction
title_fullStr EDLm(6)APred: ensemble deep learning approach for mRNA m(6)A site prediction
title_full_unstemmed EDLm(6)APred: ensemble deep learning approach for mRNA m(6)A site prediction
title_short EDLm(6)APred: ensemble deep learning approach for mRNA m(6)A site prediction
title_sort edlm(6)apred: ensemble deep learning approach for mrna m(6)a site prediction
topic Methodology Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8164815/
https://www.ncbi.nlm.nih.gov/pubmed/34051729
http://dx.doi.org/10.1186/s12859-021-04206-4
work_keys_str_mv AT zhanglin edlm6apredensembledeeplearningapproachformrnam6asiteprediction
AT ligangshen edlm6apredensembledeeplearningapproachformrnam6asiteprediction
AT lixiuyu edlm6apredensembledeeplearningapproachformrnam6asiteprediction
AT wanghonglei edlm6apredensembledeeplearningapproachformrnam6asiteprediction
AT chenshutao edlm6apredensembledeeplearningapproachformrnam6asiteprediction
AT liuhui edlm6apredensembledeeplearningapproachformrnam6asiteprediction