Cargando…

LITHOPHONE: Improving lncRNA Methylation Site Prediction Using an Ensemble Predictor

N(6)-methyladenosine (m(6)A) is one of the most widely studied epigenetic modifications, which plays an important role in many biological processes, such as splicing, RNA localization, and degradation. Studies have shown that m(6)A on lncRNA has important functions, including regulating the expressi...

Descripción completa

Detalles Bibliográficos
Autores principales: Liu, Lian, Lei, Xiujuan, Fang, Zengqiang, Tang, Yujiao, Meng, Jia, Wei, Zhen
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Frontiers Media S.A. 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7297269/
https://www.ncbi.nlm.nih.gov/pubmed/32582286
http://dx.doi.org/10.3389/fgene.2020.00545
_version_ 1783546972573007872
author Liu, Lian
Lei, Xiujuan
Fang, Zengqiang
Tang, Yujiao
Meng, Jia
Wei, Zhen
author_facet Liu, Lian
Lei, Xiujuan
Fang, Zengqiang
Tang, Yujiao
Meng, Jia
Wei, Zhen
author_sort Liu, Lian
collection PubMed
description N(6)-methyladenosine (m(6)A) is one of the most widely studied epigenetic modifications, which plays an important role in many biological processes, such as splicing, RNA localization, and degradation. Studies have shown that m(6)A on lncRNA has important functions, including regulating the expression and functions of lncRNA, regulating the synthesis of pre-mRNA, promoting the proliferation of cancer cells, and affecting cell differentiation and many others. Although a number of methods have been proposed to predict m(6)A RNA methylation sites, most of these methods aimed at general m(6)A sites prediction without noticing the uniqueness of the lncRNA methylation prediction problem. Since many lncRNAs do not have a polyA tail and cannot be captured in the polyA selection step of the most widely adopted RNA-seq library preparation protocol, lncRNA methylation sites cannot be effectively captured and are thus likely to be significantly underrepresented in existing experimental data affecting the accuracy of existing predictors. In this paper, we propose a new computational framework, LITHOPHONE, which stands for long noncoding RNA methylation sites prediction from sequence characteristics and genomic information with an ensemble predictor. We show that the methylation sites of lncRNA and mRNA have different patterns exhibited in the extracted features and should be differently handled when making predictions. Due to the used experiment protocols, the number of known lncRNA m(6)A sites is limited, and insufficient to train a reliable predictor; thus, the performance can be improved by combining both lncRNA and mRNA data using an ensemble predictor. We show that the newly developed LITHOPHONE approach achieved a reasonably good performance when tested on independent datasets (AUC: 0.966 and 0.835 under full transcript and mature mRNA modes, respectively), marking a substantial improvement compared with existing methods. Additionally, LITHOPHONE was applied to scan the entire human lncRNAome for all possible lncRNA m(6)A sites, and the results are freely accessible at: http://180.208.58.19/lith/.
format Online
Article
Text
id pubmed-7297269
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Frontiers Media S.A.
record_format MEDLINE/PubMed
spelling pubmed-72972692020-06-23 LITHOPHONE: Improving lncRNA Methylation Site Prediction Using an Ensemble Predictor Liu, Lian Lei, Xiujuan Fang, Zengqiang Tang, Yujiao Meng, Jia Wei, Zhen Front Genet Genetics N(6)-methyladenosine (m(6)A) is one of the most widely studied epigenetic modifications, which plays an important role in many biological processes, such as splicing, RNA localization, and degradation. Studies have shown that m(6)A on lncRNA has important functions, including regulating the expression and functions of lncRNA, regulating the synthesis of pre-mRNA, promoting the proliferation of cancer cells, and affecting cell differentiation and many others. Although a number of methods have been proposed to predict m(6)A RNA methylation sites, most of these methods aimed at general m(6)A sites prediction without noticing the uniqueness of the lncRNA methylation prediction problem. Since many lncRNAs do not have a polyA tail and cannot be captured in the polyA selection step of the most widely adopted RNA-seq library preparation protocol, lncRNA methylation sites cannot be effectively captured and are thus likely to be significantly underrepresented in existing experimental data affecting the accuracy of existing predictors. In this paper, we propose a new computational framework, LITHOPHONE, which stands for long noncoding RNA methylation sites prediction from sequence characteristics and genomic information with an ensemble predictor. We show that the methylation sites of lncRNA and mRNA have different patterns exhibited in the extracted features and should be differently handled when making predictions. Due to the used experiment protocols, the number of known lncRNA m(6)A sites is limited, and insufficient to train a reliable predictor; thus, the performance can be improved by combining both lncRNA and mRNA data using an ensemble predictor. We show that the newly developed LITHOPHONE approach achieved a reasonably good performance when tested on independent datasets (AUC: 0.966 and 0.835 under full transcript and mature mRNA modes, respectively), marking a substantial improvement compared with existing methods. Additionally, LITHOPHONE was applied to scan the entire human lncRNAome for all possible lncRNA m(6)A sites, and the results are freely accessible at: http://180.208.58.19/lith/. Frontiers Media S.A. 2020-06-09 /pmc/articles/PMC7297269/ /pubmed/32582286 http://dx.doi.org/10.3389/fgene.2020.00545 Text en Copyright © 2020 Liu, Lei, Fang, Tang, Meng and Wei. http://creativecommons.org/licenses/by/4.0/ This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle Genetics
Liu, Lian
Lei, Xiujuan
Fang, Zengqiang
Tang, Yujiao
Meng, Jia
Wei, Zhen
LITHOPHONE: Improving lncRNA Methylation Site Prediction Using an Ensemble Predictor
title LITHOPHONE: Improving lncRNA Methylation Site Prediction Using an Ensemble Predictor
title_full LITHOPHONE: Improving lncRNA Methylation Site Prediction Using an Ensemble Predictor
title_fullStr LITHOPHONE: Improving lncRNA Methylation Site Prediction Using an Ensemble Predictor
title_full_unstemmed LITHOPHONE: Improving lncRNA Methylation Site Prediction Using an Ensemble Predictor
title_short LITHOPHONE: Improving lncRNA Methylation Site Prediction Using an Ensemble Predictor
title_sort lithophone: improving lncrna methylation site prediction using an ensemble predictor
topic Genetics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7297269/
https://www.ncbi.nlm.nih.gov/pubmed/32582286
http://dx.doi.org/10.3389/fgene.2020.00545
work_keys_str_mv AT liulian lithophoneimprovinglncrnamethylationsitepredictionusinganensemblepredictor
AT leixiujuan lithophoneimprovinglncrnamethylationsitepredictionusinganensemblepredictor
AT fangzengqiang lithophoneimprovinglncrnamethylationsitepredictionusinganensemblepredictor
AT tangyujiao lithophoneimprovinglncrnamethylationsitepredictionusinganensemblepredictor
AT mengjia lithophoneimprovinglncrnamethylationsitepredictionusinganensemblepredictor
AT weizhen lithophoneimprovinglncrnamethylationsitepredictionusinganensemblepredictor