Cargando…

EVlncRNA-Dpred: improved prediction of experimentally validated lncRNAs by deep learning

Long non-coding RNAs (lncRNAs) played essential roles in nearly every biological process and disease. Many algorithms were developed to distinguish lncRNAs from mRNAs in transcriptomic data and facilitated discoveries of more than 600 000 of lncRNAs. However, only a tiny fraction (<1%) of lncRNA...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhou, Bailing, Ding, Maolin, Feng, Jing, Ji, Baohua, Huang, Pingping, Zhang, Junye, Yu, Xue, Cao, Zanxia, Yang, Yuedong, Zhou, Yaoqi, Wang, Jihua
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9851331/
https://www.ncbi.nlm.nih.gov/pubmed/36573492
http://dx.doi.org/10.1093/bib/bbac583
_version_ 1784872372666368000
author Zhou, Bailing
Ding, Maolin
Feng, Jing
Ji, Baohua
Huang, Pingping
Zhang, Junye
Yu, Xue
Cao, Zanxia
Yang, Yuedong
Zhou, Yaoqi
Wang, Jihua
author_facet Zhou, Bailing
Ding, Maolin
Feng, Jing
Ji, Baohua
Huang, Pingping
Zhang, Junye
Yu, Xue
Cao, Zanxia
Yang, Yuedong
Zhou, Yaoqi
Wang, Jihua
author_sort Zhou, Bailing
collection PubMed
description Long non-coding RNAs (lncRNAs) played essential roles in nearly every biological process and disease. Many algorithms were developed to distinguish lncRNAs from mRNAs in transcriptomic data and facilitated discoveries of more than 600 000 of lncRNAs. However, only a tiny fraction (<1%) of lncRNA transcripts (~4000) were further validated by low-throughput experiments (EVlncRNAs). Given the cost and labor-intensive nature of experimental validations, it is necessary to develop computational tools to prioritize those potentially functional lncRNAs because many lncRNAs from high-throughput sequencing (HTlncRNAs) could be resulted from transcriptional noises. Here, we employed deep learning algorithms to separate EVlncRNAs from HTlncRNAs and mRNAs. For overcoming the challenge of small datasets, we employed a three-layer deep-learning neural network (DNN) with a K-mer feature as the input and a small convolutional neural network (CNN) with one-hot encoding as the input. Three separate models were trained for human (h), mouse (m) and plant (p), respectively. The final concatenated models (EVlncRNA-Dpred (h), EVlncRNA-Dpred (m) and EVlncRNA-Dpred (p)) provided substantial improvement over a previous model based on support-vector-machines (EVlncRNA-pred). For example, EVlncRNA-Dpred (h) achieved 0.896 for the area under receiver-operating characteristic curve, compared with 0.582 given by sequence-based EVlncRNA-pred model. The models developed here should be useful for screening lncRNA transcripts for experimental validations. EVlncRNA-Dpred is available as a web server at https://www.sdklab-biophysics-dzu.net/EVlncRNA-Dpred/index.html, and the data and source code can be freely available along with the web server.
format Online
Article
Text
id pubmed-9851331
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-98513312023-01-20 EVlncRNA-Dpred: improved prediction of experimentally validated lncRNAs by deep learning Zhou, Bailing Ding, Maolin Feng, Jing Ji, Baohua Huang, Pingping Zhang, Junye Yu, Xue Cao, Zanxia Yang, Yuedong Zhou, Yaoqi Wang, Jihua Brief Bioinform Problem Solving Protocol Long non-coding RNAs (lncRNAs) played essential roles in nearly every biological process and disease. Many algorithms were developed to distinguish lncRNAs from mRNAs in transcriptomic data and facilitated discoveries of more than 600 000 of lncRNAs. However, only a tiny fraction (<1%) of lncRNA transcripts (~4000) were further validated by low-throughput experiments (EVlncRNAs). Given the cost and labor-intensive nature of experimental validations, it is necessary to develop computational tools to prioritize those potentially functional lncRNAs because many lncRNAs from high-throughput sequencing (HTlncRNAs) could be resulted from transcriptional noises. Here, we employed deep learning algorithms to separate EVlncRNAs from HTlncRNAs and mRNAs. For overcoming the challenge of small datasets, we employed a three-layer deep-learning neural network (DNN) with a K-mer feature as the input and a small convolutional neural network (CNN) with one-hot encoding as the input. Three separate models were trained for human (h), mouse (m) and plant (p), respectively. The final concatenated models (EVlncRNA-Dpred (h), EVlncRNA-Dpred (m) and EVlncRNA-Dpred (p)) provided substantial improvement over a previous model based on support-vector-machines (EVlncRNA-pred). For example, EVlncRNA-Dpred (h) achieved 0.896 for the area under receiver-operating characteristic curve, compared with 0.582 given by sequence-based EVlncRNA-pred model. The models developed here should be useful for screening lncRNA transcripts for experimental validations. EVlncRNA-Dpred is available as a web server at https://www.sdklab-biophysics-dzu.net/EVlncRNA-Dpred/index.html, and the data and source code can be freely available along with the web server. Oxford University Press 2022-12-27 /pmc/articles/PMC9851331/ /pubmed/36573492 http://dx.doi.org/10.1093/bib/bbac583 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Problem Solving Protocol
Zhou, Bailing
Ding, Maolin
Feng, Jing
Ji, Baohua
Huang, Pingping
Zhang, Junye
Yu, Xue
Cao, Zanxia
Yang, Yuedong
Zhou, Yaoqi
Wang, Jihua
EVlncRNA-Dpred: improved prediction of experimentally validated lncRNAs by deep learning
title EVlncRNA-Dpred: improved prediction of experimentally validated lncRNAs by deep learning
title_full EVlncRNA-Dpred: improved prediction of experimentally validated lncRNAs by deep learning
title_fullStr EVlncRNA-Dpred: improved prediction of experimentally validated lncRNAs by deep learning
title_full_unstemmed EVlncRNA-Dpred: improved prediction of experimentally validated lncRNAs by deep learning
title_short EVlncRNA-Dpred: improved prediction of experimentally validated lncRNAs by deep learning
title_sort evlncrna-dpred: improved prediction of experimentally validated lncrnas by deep learning
topic Problem Solving Protocol
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9851331/
https://www.ncbi.nlm.nih.gov/pubmed/36573492
http://dx.doi.org/10.1093/bib/bbac583
work_keys_str_mv AT zhoubailing evlncrnadpredimprovedpredictionofexperimentallyvalidatedlncrnasbydeeplearning
AT dingmaolin evlncrnadpredimprovedpredictionofexperimentallyvalidatedlncrnasbydeeplearning
AT fengjing evlncrnadpredimprovedpredictionofexperimentallyvalidatedlncrnasbydeeplearning
AT jibaohua evlncrnadpredimprovedpredictionofexperimentallyvalidatedlncrnasbydeeplearning
AT huangpingping evlncrnadpredimprovedpredictionofexperimentallyvalidatedlncrnasbydeeplearning
AT zhangjunye evlncrnadpredimprovedpredictionofexperimentallyvalidatedlncrnasbydeeplearning
AT yuxue evlncrnadpredimprovedpredictionofexperimentallyvalidatedlncrnasbydeeplearning
AT caozanxia evlncrnadpredimprovedpredictionofexperimentallyvalidatedlncrnasbydeeplearning
AT yangyuedong evlncrnadpredimprovedpredictionofexperimentallyvalidatedlncrnasbydeeplearning
AT zhouyaoqi evlncrnadpredimprovedpredictionofexperimentallyvalidatedlncrnasbydeeplearning
AT wangjihua evlncrnadpredimprovedpredictionofexperimentallyvalidatedlncrnasbydeeplearning