Cargando…
BERT-m7G: A Transformer Architecture Based on BERT and Stacking Ensemble to Identify RNA N7-Methylguanosine Sites from Sequence Information
As one of the most prevalent posttranscriptional modifications of RNA, N7-methylguanosine (m7G) plays an essential role in the regulation of gene expression. Accurate identification of m7G sites in the transcriptome is invaluable for better revealing their potential functional mechanisms. Although h...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Hindawi
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8413034/ https://www.ncbi.nlm.nih.gov/pubmed/34484416 http://dx.doi.org/10.1155/2021/7764764 |
_version_ | 1783747575470358528 |
---|---|
author | Zhang, Lu Qin, Xinyi Liu, Min Liu, Guangzhong Ren, Yuxiao |
author_facet | Zhang, Lu Qin, Xinyi Liu, Min Liu, Guangzhong Ren, Yuxiao |
author_sort | Zhang, Lu |
collection | PubMed |
description | As one of the most prevalent posttranscriptional modifications of RNA, N7-methylguanosine (m7G) plays an essential role in the regulation of gene expression. Accurate identification of m7G sites in the transcriptome is invaluable for better revealing their potential functional mechanisms. Although high-throughput experimental methods can locate m7G sites precisely, they are overpriced and time-consuming. Hence, it is imperative to design an efficient computational method that can accurately identify the m7G sites. In this study, we propose a novel method via incorporating BERT-based multilingual model in bioinformatics to represent the information of RNA sequences. Firstly, we treat RNA sequences as natural sentences and then employ bidirectional encoder representations from transformers (BERT) model to transform them into fixed-length numerical matrices. Secondly, a feature selection scheme based on the elastic net method is constructed to eliminate redundant features and retain important features. Finally, the selected feature subset is input into a stacking ensemble classifier to predict m7G sites, and the hyperparameters of the classifier are tuned with tree-structured Parzen estimator (TPE) approach. By 10-fold cross-validation, the performance of BERT-m7G is measured with an ACC of 95.48% and an MCC of 0.9100. The experimental results indicate that the proposed method significantly outperforms state-of-the-art prediction methods in the identification of m7G modifications. |
format | Online Article Text |
id | pubmed-8413034 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Hindawi |
record_format | MEDLINE/PubMed |
spelling | pubmed-84130342021-09-03 BERT-m7G: A Transformer Architecture Based on BERT and Stacking Ensemble to Identify RNA N7-Methylguanosine Sites from Sequence Information Zhang, Lu Qin, Xinyi Liu, Min Liu, Guangzhong Ren, Yuxiao Comput Math Methods Med Research Article As one of the most prevalent posttranscriptional modifications of RNA, N7-methylguanosine (m7G) plays an essential role in the regulation of gene expression. Accurate identification of m7G sites in the transcriptome is invaluable for better revealing their potential functional mechanisms. Although high-throughput experimental methods can locate m7G sites precisely, they are overpriced and time-consuming. Hence, it is imperative to design an efficient computational method that can accurately identify the m7G sites. In this study, we propose a novel method via incorporating BERT-based multilingual model in bioinformatics to represent the information of RNA sequences. Firstly, we treat RNA sequences as natural sentences and then employ bidirectional encoder representations from transformers (BERT) model to transform them into fixed-length numerical matrices. Secondly, a feature selection scheme based on the elastic net method is constructed to eliminate redundant features and retain important features. Finally, the selected feature subset is input into a stacking ensemble classifier to predict m7G sites, and the hyperparameters of the classifier are tuned with tree-structured Parzen estimator (TPE) approach. By 10-fold cross-validation, the performance of BERT-m7G is measured with an ACC of 95.48% and an MCC of 0.9100. The experimental results indicate that the proposed method significantly outperforms state-of-the-art prediction methods in the identification of m7G modifications. Hindawi 2021-08-25 /pmc/articles/PMC8413034/ /pubmed/34484416 http://dx.doi.org/10.1155/2021/7764764 Text en Copyright © 2021 Lu Zhang et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Zhang, Lu Qin, Xinyi Liu, Min Liu, Guangzhong Ren, Yuxiao BERT-m7G: A Transformer Architecture Based on BERT and Stacking Ensemble to Identify RNA N7-Methylguanosine Sites from Sequence Information |
title | BERT-m7G: A Transformer Architecture Based on BERT and Stacking Ensemble to Identify RNA N7-Methylguanosine Sites from Sequence Information |
title_full | BERT-m7G: A Transformer Architecture Based on BERT and Stacking Ensemble to Identify RNA N7-Methylguanosine Sites from Sequence Information |
title_fullStr | BERT-m7G: A Transformer Architecture Based on BERT and Stacking Ensemble to Identify RNA N7-Methylguanosine Sites from Sequence Information |
title_full_unstemmed | BERT-m7G: A Transformer Architecture Based on BERT and Stacking Ensemble to Identify RNA N7-Methylguanosine Sites from Sequence Information |
title_short | BERT-m7G: A Transformer Architecture Based on BERT and Stacking Ensemble to Identify RNA N7-Methylguanosine Sites from Sequence Information |
title_sort | bert-m7g: a transformer architecture based on bert and stacking ensemble to identify rna n7-methylguanosine sites from sequence information |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8413034/ https://www.ncbi.nlm.nih.gov/pubmed/34484416 http://dx.doi.org/10.1155/2021/7764764 |
work_keys_str_mv | AT zhanglu bertm7gatransformerarchitecturebasedonbertandstackingensembletoidentifyrnan7methylguanosinesitesfromsequenceinformation AT qinxinyi bertm7gatransformerarchitecturebasedonbertandstackingensembletoidentifyrnan7methylguanosinesitesfromsequenceinformation AT liumin bertm7gatransformerarchitecturebasedonbertandstackingensembletoidentifyrnan7methylguanosinesitesfromsequenceinformation AT liuguangzhong bertm7gatransformerarchitecturebasedonbertandstackingensembletoidentifyrnan7methylguanosinesitesfromsequenceinformation AT renyuxiao bertm7gatransformerarchitecturebasedonbertandstackingensembletoidentifyrnan7methylguanosinesitesfromsequenceinformation |