Cargando…

BERT-m7G: A Transformer Architecture Based on BERT and Stacking Ensemble to Identify RNA N7-Methylguanosine Sites from Sequence Information

As one of the most prevalent posttranscriptional modifications of RNA, N7-methylguanosine (m7G) plays an essential role in the regulation of gene expression. Accurate identification of m7G sites in the transcriptome is invaluable for better revealing their potential functional mechanisms. Although h...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Lu, Qin, Xinyi, Liu, Min, Liu, Guangzhong, Ren, Yuxiao
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Hindawi 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8413034/
https://www.ncbi.nlm.nih.gov/pubmed/34484416
http://dx.doi.org/10.1155/2021/7764764
_version_ 1783747575470358528
author Zhang, Lu
Qin, Xinyi
Liu, Min
Liu, Guangzhong
Ren, Yuxiao
author_facet Zhang, Lu
Qin, Xinyi
Liu, Min
Liu, Guangzhong
Ren, Yuxiao
author_sort Zhang, Lu
collection PubMed
description As one of the most prevalent posttranscriptional modifications of RNA, N7-methylguanosine (m7G) plays an essential role in the regulation of gene expression. Accurate identification of m7G sites in the transcriptome is invaluable for better revealing their potential functional mechanisms. Although high-throughput experimental methods can locate m7G sites precisely, they are overpriced and time-consuming. Hence, it is imperative to design an efficient computational method that can accurately identify the m7G sites. In this study, we propose a novel method via incorporating BERT-based multilingual model in bioinformatics to represent the information of RNA sequences. Firstly, we treat RNA sequences as natural sentences and then employ bidirectional encoder representations from transformers (BERT) model to transform them into fixed-length numerical matrices. Secondly, a feature selection scheme based on the elastic net method is constructed to eliminate redundant features and retain important features. Finally, the selected feature subset is input into a stacking ensemble classifier to predict m7G sites, and the hyperparameters of the classifier are tuned with tree-structured Parzen estimator (TPE) approach. By 10-fold cross-validation, the performance of BERT-m7G is measured with an ACC of 95.48% and an MCC of 0.9100. The experimental results indicate that the proposed method significantly outperforms state-of-the-art prediction methods in the identification of m7G modifications.
format Online
Article
Text
id pubmed-8413034
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Hindawi
record_format MEDLINE/PubMed
spelling pubmed-84130342021-09-03 BERT-m7G: A Transformer Architecture Based on BERT and Stacking Ensemble to Identify RNA N7-Methylguanosine Sites from Sequence Information Zhang, Lu Qin, Xinyi Liu, Min Liu, Guangzhong Ren, Yuxiao Comput Math Methods Med Research Article As one of the most prevalent posttranscriptional modifications of RNA, N7-methylguanosine (m7G) plays an essential role in the regulation of gene expression. Accurate identification of m7G sites in the transcriptome is invaluable for better revealing their potential functional mechanisms. Although high-throughput experimental methods can locate m7G sites precisely, they are overpriced and time-consuming. Hence, it is imperative to design an efficient computational method that can accurately identify the m7G sites. In this study, we propose a novel method via incorporating BERT-based multilingual model in bioinformatics to represent the information of RNA sequences. Firstly, we treat RNA sequences as natural sentences and then employ bidirectional encoder representations from transformers (BERT) model to transform them into fixed-length numerical matrices. Secondly, a feature selection scheme based on the elastic net method is constructed to eliminate redundant features and retain important features. Finally, the selected feature subset is input into a stacking ensemble classifier to predict m7G sites, and the hyperparameters of the classifier are tuned with tree-structured Parzen estimator (TPE) approach. By 10-fold cross-validation, the performance of BERT-m7G is measured with an ACC of 95.48% and an MCC of 0.9100. The experimental results indicate that the proposed method significantly outperforms state-of-the-art prediction methods in the identification of m7G modifications. Hindawi 2021-08-25 /pmc/articles/PMC8413034/ /pubmed/34484416 http://dx.doi.org/10.1155/2021/7764764 Text en Copyright © 2021 Lu Zhang et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Zhang, Lu
Qin, Xinyi
Liu, Min
Liu, Guangzhong
Ren, Yuxiao
BERT-m7G: A Transformer Architecture Based on BERT and Stacking Ensemble to Identify RNA N7-Methylguanosine Sites from Sequence Information
title BERT-m7G: A Transformer Architecture Based on BERT and Stacking Ensemble to Identify RNA N7-Methylguanosine Sites from Sequence Information
title_full BERT-m7G: A Transformer Architecture Based on BERT and Stacking Ensemble to Identify RNA N7-Methylguanosine Sites from Sequence Information
title_fullStr BERT-m7G: A Transformer Architecture Based on BERT and Stacking Ensemble to Identify RNA N7-Methylguanosine Sites from Sequence Information
title_full_unstemmed BERT-m7G: A Transformer Architecture Based on BERT and Stacking Ensemble to Identify RNA N7-Methylguanosine Sites from Sequence Information
title_short BERT-m7G: A Transformer Architecture Based on BERT and Stacking Ensemble to Identify RNA N7-Methylguanosine Sites from Sequence Information
title_sort bert-m7g: a transformer architecture based on bert and stacking ensemble to identify rna n7-methylguanosine sites from sequence information
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8413034/
https://www.ncbi.nlm.nih.gov/pubmed/34484416
http://dx.doi.org/10.1155/2021/7764764
work_keys_str_mv AT zhanglu bertm7gatransformerarchitecturebasedonbertandstackingensembletoidentifyrnan7methylguanosinesitesfromsequenceinformation
AT qinxinyi bertm7gatransformerarchitecturebasedonbertandstackingensembletoidentifyrnan7methylguanosinesitesfromsequenceinformation
AT liumin bertm7gatransformerarchitecturebasedonbertandstackingensembletoidentifyrnan7methylguanosinesitesfromsequenceinformation
AT liuguangzhong bertm7gatransformerarchitecturebasedonbertandstackingensembletoidentifyrnan7methylguanosinesitesfromsequenceinformation
AT renyuxiao bertm7gatransformerarchitecturebasedonbertandstackingensembletoidentifyrnan7methylguanosinesitesfromsequenceinformation