Cargando…
PreLnc: An Accurate Tool for Predicting lncRNAs Based on Multiple Features
Accumulating evidence indicates that long non-coding RNAs (lncRNAs) have certain similarities with messenger RNAs (mRNAs) and are associated with numerous important biological processes, thereby demanding methods to distinguish them. Based on machine learning algorithms, a variety of methods are dev...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
MDPI
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7563287/ https://www.ncbi.nlm.nih.gov/pubmed/32842486 http://dx.doi.org/10.3390/genes11090981 |
_version_ | 1783595457187938304 |
---|---|
author | Cao, Lei Wang, Yupeng Bi, Changwei Ye, Qiaolin Yin, Tongming Ye, Ning |
author_facet | Cao, Lei Wang, Yupeng Bi, Changwei Ye, Qiaolin Yin, Tongming Ye, Ning |
author_sort | Cao, Lei |
collection | PubMed |
description | Accumulating evidence indicates that long non-coding RNAs (lncRNAs) have certain similarities with messenger RNAs (mRNAs) and are associated with numerous important biological processes, thereby demanding methods to distinguish them. Based on machine learning algorithms, a variety of methods are developed to identify lncRNAs, providing significant basic data support for subsequent studies. However, many tools lack certain scalability, versatility and balance, and some tools rely on genome sequence and annotation. In this paper, we propose a convenient and accurate tool “PreLnc”, which uses high-confidence lncRNA and mRNA transcripts to build prediction models through feature selection and classifiers. The false discovery rate (FDR) adjusted p-value and Z-value were used for analyzing the tri-nucleotide composition of transcripts of different species. Conclusions can be drawn from the experiment that there were significant differences in RNA transcripts among plants, which may be related to evolutionary conservation and the fact that plants are under evolutionary pressure for a longer time than animals. Combining with the Pearson correlation coefficient, we use the incremental feature selection (IFS) method and the comparison of multiple classifiers to build the model. Finally, the balanced random forest was used to construct the classifier, and PreLnc obtained 91.09% accuracy for 349,186 transcripts of animals and plants. In addition, by comparing standard performance measurements, PreLnc performed better than other prediction tools. |
format | Online Article Text |
id | pubmed-7563287 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | MDPI |
record_format | MEDLINE/PubMed |
spelling | pubmed-75632872020-10-27 PreLnc: An Accurate Tool for Predicting lncRNAs Based on Multiple Features Cao, Lei Wang, Yupeng Bi, Changwei Ye, Qiaolin Yin, Tongming Ye, Ning Genes (Basel) Article Accumulating evidence indicates that long non-coding RNAs (lncRNAs) have certain similarities with messenger RNAs (mRNAs) and are associated with numerous important biological processes, thereby demanding methods to distinguish them. Based on machine learning algorithms, a variety of methods are developed to identify lncRNAs, providing significant basic data support for subsequent studies. However, many tools lack certain scalability, versatility and balance, and some tools rely on genome sequence and annotation. In this paper, we propose a convenient and accurate tool “PreLnc”, which uses high-confidence lncRNA and mRNA transcripts to build prediction models through feature selection and classifiers. The false discovery rate (FDR) adjusted p-value and Z-value were used for analyzing the tri-nucleotide composition of transcripts of different species. Conclusions can be drawn from the experiment that there were significant differences in RNA transcripts among plants, which may be related to evolutionary conservation and the fact that plants are under evolutionary pressure for a longer time than animals. Combining with the Pearson correlation coefficient, we use the incremental feature selection (IFS) method and the comparison of multiple classifiers to build the model. Finally, the balanced random forest was used to construct the classifier, and PreLnc obtained 91.09% accuracy for 349,186 transcripts of animals and plants. In addition, by comparing standard performance measurements, PreLnc performed better than other prediction tools. MDPI 2020-08-23 /pmc/articles/PMC7563287/ /pubmed/32842486 http://dx.doi.org/10.3390/genes11090981 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/). |
spellingShingle | Article Cao, Lei Wang, Yupeng Bi, Changwei Ye, Qiaolin Yin, Tongming Ye, Ning PreLnc: An Accurate Tool for Predicting lncRNAs Based on Multiple Features |
title | PreLnc: An Accurate Tool for Predicting lncRNAs Based on Multiple Features |
title_full | PreLnc: An Accurate Tool for Predicting lncRNAs Based on Multiple Features |
title_fullStr | PreLnc: An Accurate Tool for Predicting lncRNAs Based on Multiple Features |
title_full_unstemmed | PreLnc: An Accurate Tool for Predicting lncRNAs Based on Multiple Features |
title_short | PreLnc: An Accurate Tool for Predicting lncRNAs Based on Multiple Features |
title_sort | prelnc: an accurate tool for predicting lncrnas based on multiple features |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7563287/ https://www.ncbi.nlm.nih.gov/pubmed/32842486 http://dx.doi.org/10.3390/genes11090981 |
work_keys_str_mv | AT caolei prelncanaccuratetoolforpredictinglncrnasbasedonmultiplefeatures AT wangyupeng prelncanaccuratetoolforpredictinglncrnasbasedonmultiplefeatures AT bichangwei prelncanaccuratetoolforpredictinglncrnasbasedonmultiplefeatures AT yeqiaolin prelncanaccuratetoolforpredictinglncrnasbasedonmultiplefeatures AT yintongming prelncanaccuratetoolforpredictinglncrnasbasedonmultiplefeatures AT yening prelncanaccuratetoolforpredictinglncrnasbasedonmultiplefeatures |