Cargando…

PlncRNA-HDeep: plant long noncoding RNA prediction using hybrid deep learning based on two encoding styles

BACKGROUND: Long noncoding RNAs (lncRNAs) play an important role in regulating biological activities and their prediction is significant for exploring biological processes. Long short-term memory (LSTM) and convolutional neural network (CNN) can automatically extract and learn the abstract informati...

Descripción completa

Detalles Bibliográficos
Autores principales: Meng, Jun, Kang, Qiang, Chang, Zheng, Luan, Yushi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8114701/
https://www.ncbi.nlm.nih.gov/pubmed/33980138
http://dx.doi.org/10.1186/s12859-020-03870-2
_version_ 1783691103942213632
author Meng, Jun
Kang, Qiang
Chang, Zheng
Luan, Yushi
author_facet Meng, Jun
Kang, Qiang
Chang, Zheng
Luan, Yushi
author_sort Meng, Jun
collection PubMed
description BACKGROUND: Long noncoding RNAs (lncRNAs) play an important role in regulating biological activities and their prediction is significant for exploring biological processes. Long short-term memory (LSTM) and convolutional neural network (CNN) can automatically extract and learn the abstract information from the encoded RNA sequences to avoid complex feature engineering. An ensemble model learns the information from multiple perspectives and shows better performance than a single model. It is feasible and interesting that the RNA sequence is considered as sentence and image to train LSTM and CNN respectively, and then the trained models are hybridized to predict lncRNAs. Up to present, there are various predictors for lncRNAs, but few of them are proposed for plant. A reliable and powerful predictor for plant lncRNAs is necessary. RESULTS: To boost the performance of predicting lncRNAs, this paper proposes a hybrid deep learning model based on two encoding styles (PlncRNA-HDeep), which does not require prior knowledge and only uses RNA sequences to train the models for predicting plant lncRNAs. It not only learns the diversified information from RNA sequences encoded by p-nucleotide and one-hot encodings, but also takes advantages of lncRNA-LSTM proposed in our previous study and CNN. The parameters are adjusted and three hybrid strategies are tested to maximize its performance. Experiment results show that PlncRNA-HDeep is more effective than lncRNA-LSTM and CNN and obtains 97.9% sensitivity, 95.1% precision, 96.5% accuracy and 96.5% F1 score on Zea mays dataset which are better than those of several shallow machine learning methods (support vector machine, random forest, k-nearest neighbor, decision tree, naive Bayes and logistic regression) and some existing tools (CNCI, PLEK, CPC2, LncADeep and lncRNAnet). CONCLUSIONS: PlncRNA-HDeep is feasible and obtains the credible predictive results. It may also provide valuable references for other related research.
format Online
Article
Text
id pubmed-8114701
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-81147012021-05-12 PlncRNA-HDeep: plant long noncoding RNA prediction using hybrid deep learning based on two encoding styles Meng, Jun Kang, Qiang Chang, Zheng Luan, Yushi BMC Bioinformatics Research BACKGROUND: Long noncoding RNAs (lncRNAs) play an important role in regulating biological activities and their prediction is significant for exploring biological processes. Long short-term memory (LSTM) and convolutional neural network (CNN) can automatically extract and learn the abstract information from the encoded RNA sequences to avoid complex feature engineering. An ensemble model learns the information from multiple perspectives and shows better performance than a single model. It is feasible and interesting that the RNA sequence is considered as sentence and image to train LSTM and CNN respectively, and then the trained models are hybridized to predict lncRNAs. Up to present, there are various predictors for lncRNAs, but few of them are proposed for plant. A reliable and powerful predictor for plant lncRNAs is necessary. RESULTS: To boost the performance of predicting lncRNAs, this paper proposes a hybrid deep learning model based on two encoding styles (PlncRNA-HDeep), which does not require prior knowledge and only uses RNA sequences to train the models for predicting plant lncRNAs. It not only learns the diversified information from RNA sequences encoded by p-nucleotide and one-hot encodings, but also takes advantages of lncRNA-LSTM proposed in our previous study and CNN. The parameters are adjusted and three hybrid strategies are tested to maximize its performance. Experiment results show that PlncRNA-HDeep is more effective than lncRNA-LSTM and CNN and obtains 97.9% sensitivity, 95.1% precision, 96.5% accuracy and 96.5% F1 score on Zea mays dataset which are better than those of several shallow machine learning methods (support vector machine, random forest, k-nearest neighbor, decision tree, naive Bayes and logistic regression) and some existing tools (CNCI, PLEK, CPC2, LncADeep and lncRNAnet). CONCLUSIONS: PlncRNA-HDeep is feasible and obtains the credible predictive results. It may also provide valuable references for other related research. BioMed Central 2021-05-12 /pmc/articles/PMC8114701/ /pubmed/33980138 http://dx.doi.org/10.1186/s12859-020-03870-2 Text en © The Author(s) 2021 https://creativecommons.org/licenses/by/4.0/Open AccessThis article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) . The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/ (https://creativecommons.org/publicdomain/zero/1.0/) ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
spellingShingle Research
Meng, Jun
Kang, Qiang
Chang, Zheng
Luan, Yushi
PlncRNA-HDeep: plant long noncoding RNA prediction using hybrid deep learning based on two encoding styles
title PlncRNA-HDeep: plant long noncoding RNA prediction using hybrid deep learning based on two encoding styles
title_full PlncRNA-HDeep: plant long noncoding RNA prediction using hybrid deep learning based on two encoding styles
title_fullStr PlncRNA-HDeep: plant long noncoding RNA prediction using hybrid deep learning based on two encoding styles
title_full_unstemmed PlncRNA-HDeep: plant long noncoding RNA prediction using hybrid deep learning based on two encoding styles
title_short PlncRNA-HDeep: plant long noncoding RNA prediction using hybrid deep learning based on two encoding styles
title_sort plncrna-hdeep: plant long noncoding rna prediction using hybrid deep learning based on two encoding styles
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8114701/
https://www.ncbi.nlm.nih.gov/pubmed/33980138
http://dx.doi.org/10.1186/s12859-020-03870-2
work_keys_str_mv AT mengjun plncrnahdeepplantlongnoncodingrnapredictionusinghybriddeeplearningbasedontwoencodingstyles
AT kangqiang plncrnahdeepplantlongnoncodingrnapredictionusinghybriddeeplearningbasedontwoencodingstyles
AT changzheng plncrnahdeepplantlongnoncodingrnapredictionusinghybriddeeplearningbasedontwoencodingstyles
AT luanyushi plncrnahdeepplantlongnoncodingrnapredictionusinghybriddeeplearningbasedontwoencodingstyles