Cargando…

PreLnc: An Accurate Tool for Predicting lncRNAs Based on Multiple Features

Accumulating evidence indicates that long non-coding RNAs (lncRNAs) have certain similarities with messenger RNAs (mRNAs) and are associated with numerous important biological processes, thereby demanding methods to distinguish them. Based on machine learning algorithms, a variety of methods are dev...

Descripción completa

Detalles Bibliográficos
Autores principales: Cao, Lei, Wang, Yupeng, Bi, Changwei, Ye, Qiaolin, Yin, Tongming, Ye, Ning
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7563287/
https://www.ncbi.nlm.nih.gov/pubmed/32842486
http://dx.doi.org/10.3390/genes11090981
_version_ 1783595457187938304
author Cao, Lei
Wang, Yupeng
Bi, Changwei
Ye, Qiaolin
Yin, Tongming
Ye, Ning
author_facet Cao, Lei
Wang, Yupeng
Bi, Changwei
Ye, Qiaolin
Yin, Tongming
Ye, Ning
author_sort Cao, Lei
collection PubMed
description Accumulating evidence indicates that long non-coding RNAs (lncRNAs) have certain similarities with messenger RNAs (mRNAs) and are associated with numerous important biological processes, thereby demanding methods to distinguish them. Based on machine learning algorithms, a variety of methods are developed to identify lncRNAs, providing significant basic data support for subsequent studies. However, many tools lack certain scalability, versatility and balance, and some tools rely on genome sequence and annotation. In this paper, we propose a convenient and accurate tool “PreLnc”, which uses high-confidence lncRNA and mRNA transcripts to build prediction models through feature selection and classifiers. The false discovery rate (FDR) adjusted p-value and Z-value were used for analyzing the tri-nucleotide composition of transcripts of different species. Conclusions can be drawn from the experiment that there were significant differences in RNA transcripts among plants, which may be related to evolutionary conservation and the fact that plants are under evolutionary pressure for a longer time than animals. Combining with the Pearson correlation coefficient, we use the incremental feature selection (IFS) method and the comparison of multiple classifiers to build the model. Finally, the balanced random forest was used to construct the classifier, and PreLnc obtained 91.09% accuracy for 349,186 transcripts of animals and plants. In addition, by comparing standard performance measurements, PreLnc performed better than other prediction tools.
format Online
Article
Text
id pubmed-7563287
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-75632872020-10-27 PreLnc: An Accurate Tool for Predicting lncRNAs Based on Multiple Features Cao, Lei Wang, Yupeng Bi, Changwei Ye, Qiaolin Yin, Tongming Ye, Ning Genes (Basel) Article Accumulating evidence indicates that long non-coding RNAs (lncRNAs) have certain similarities with messenger RNAs (mRNAs) and are associated with numerous important biological processes, thereby demanding methods to distinguish them. Based on machine learning algorithms, a variety of methods are developed to identify lncRNAs, providing significant basic data support for subsequent studies. However, many tools lack certain scalability, versatility and balance, and some tools rely on genome sequence and annotation. In this paper, we propose a convenient and accurate tool “PreLnc”, which uses high-confidence lncRNA and mRNA transcripts to build prediction models through feature selection and classifiers. The false discovery rate (FDR) adjusted p-value and Z-value were used for analyzing the tri-nucleotide composition of transcripts of different species. Conclusions can be drawn from the experiment that there were significant differences in RNA transcripts among plants, which may be related to evolutionary conservation and the fact that plants are under evolutionary pressure for a longer time than animals. Combining with the Pearson correlation coefficient, we use the incremental feature selection (IFS) method and the comparison of multiple classifiers to build the model. Finally, the balanced random forest was used to construct the classifier, and PreLnc obtained 91.09% accuracy for 349,186 transcripts of animals and plants. In addition, by comparing standard performance measurements, PreLnc performed better than other prediction tools. MDPI 2020-08-23 /pmc/articles/PMC7563287/ /pubmed/32842486 http://dx.doi.org/10.3390/genes11090981 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Cao, Lei
Wang, Yupeng
Bi, Changwei
Ye, Qiaolin
Yin, Tongming
Ye, Ning
PreLnc: An Accurate Tool for Predicting lncRNAs Based on Multiple Features
title PreLnc: An Accurate Tool for Predicting lncRNAs Based on Multiple Features
title_full PreLnc: An Accurate Tool for Predicting lncRNAs Based on Multiple Features
title_fullStr PreLnc: An Accurate Tool for Predicting lncRNAs Based on Multiple Features
title_full_unstemmed PreLnc: An Accurate Tool for Predicting lncRNAs Based on Multiple Features
title_short PreLnc: An Accurate Tool for Predicting lncRNAs Based on Multiple Features
title_sort prelnc: an accurate tool for predicting lncrnas based on multiple features
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7563287/
https://www.ncbi.nlm.nih.gov/pubmed/32842486
http://dx.doi.org/10.3390/genes11090981
work_keys_str_mv AT caolei prelncanaccuratetoolforpredictinglncrnasbasedonmultiplefeatures
AT wangyupeng prelncanaccuratetoolforpredictinglncrnasbasedonmultiplefeatures
AT bichangwei prelncanaccuratetoolforpredictinglncrnasbasedonmultiplefeatures
AT yeqiaolin prelncanaccuratetoolforpredictinglncrnasbasedonmultiplefeatures
AT yintongming prelncanaccuratetoolforpredictinglncrnasbasedonmultiplefeatures
AT yening prelncanaccuratetoolforpredictinglncrnasbasedonmultiplefeatures