Cargando…
RNAmining: A machine learning stand-alone and web server tool for RNA coding potential prediction
Non-coding RNAs (ncRNAs) are important players in the cellular regulation of organisms from different kingdoms. One of the key steps in ncRNAs research is the ability to distinguish coding/non-coding sequences. We applied seven machine learning algorithms (Naive Bayes, Support Vector Machine, K-Near...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
F1000 Research Limited
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8201426/ https://www.ncbi.nlm.nih.gov/pubmed/34164114 http://dx.doi.org/10.12688/f1000research.52350.2 |
_version_ | 1783707814059835392 |
---|---|
author | Ramos, Thaís A.R. Galindo, Nilbson R.O. Arias-Carrasco, Raúl da Silva, Cecília F. Maracaja-Coutinho, Vinicius do Rêgo, Thaís G. |
author_facet | Ramos, Thaís A.R. Galindo, Nilbson R.O. Arias-Carrasco, Raúl da Silva, Cecília F. Maracaja-Coutinho, Vinicius do Rêgo, Thaís G. |
author_sort | Ramos, Thaís A.R. |
collection | PubMed |
description | Non-coding RNAs (ncRNAs) are important players in the cellular regulation of organisms from different kingdoms. One of the key steps in ncRNAs research is the ability to distinguish coding/non-coding sequences. We applied seven machine learning algorithms (Naive Bayes, Support Vector Machine, K-Nearest Neighbors, Random Forest, Extreme Gradient Boosting, Neural Networks and Deep Learning) through model organisms from different evolutionary branches to create a stand-alone and web server tool (RNAmining) to distinguish coding and non-coding sequences. Firstly, we used coding/non-coding sequences downloaded from Ensembl (April 14th, 2020). Then, coding/non-coding sequences were balanced, had their trinucleotides count analysed (64 features) and we performed a normalization by the sequence length, resulting in total of 180 models. The machine learning algorithms validations were performed using 10-fold cross-validation and we selected the algorithm with the best results (eXtreme Gradient Boosting) to implement at RNAmining. Best F1-scores ranged from 97.56% to 99.57% depending on the organism. Moreover, we produced a benchmarking with other tools already in literature (CPAT, CPC2, RNAcon and TransDecoder) and our results outperformed them. Both stand-alone and web server versions of RNAmining are freely available at https://rnamining.integrativebioinformatics.me/. |
format | Online Article Text |
id | pubmed-8201426 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | F1000 Research Limited |
record_format | MEDLINE/PubMed |
spelling | pubmed-82014262021-06-22 RNAmining: A machine learning stand-alone and web server tool for RNA coding potential prediction Ramos, Thaís A.R. Galindo, Nilbson R.O. Arias-Carrasco, Raúl da Silva, Cecília F. Maracaja-Coutinho, Vinicius do Rêgo, Thaís G. F1000Res Software Tool Article Non-coding RNAs (ncRNAs) are important players in the cellular regulation of organisms from different kingdoms. One of the key steps in ncRNAs research is the ability to distinguish coding/non-coding sequences. We applied seven machine learning algorithms (Naive Bayes, Support Vector Machine, K-Nearest Neighbors, Random Forest, Extreme Gradient Boosting, Neural Networks and Deep Learning) through model organisms from different evolutionary branches to create a stand-alone and web server tool (RNAmining) to distinguish coding and non-coding sequences. Firstly, we used coding/non-coding sequences downloaded from Ensembl (April 14th, 2020). Then, coding/non-coding sequences were balanced, had their trinucleotides count analysed (64 features) and we performed a normalization by the sequence length, resulting in total of 180 models. The machine learning algorithms validations were performed using 10-fold cross-validation and we selected the algorithm with the best results (eXtreme Gradient Boosting) to implement at RNAmining. Best F1-scores ranged from 97.56% to 99.57% depending on the organism. Moreover, we produced a benchmarking with other tools already in literature (CPAT, CPC2, RNAcon and TransDecoder) and our results outperformed them. Both stand-alone and web server versions of RNAmining are freely available at https://rnamining.integrativebioinformatics.me/. F1000 Research Limited 2021-06-08 /pmc/articles/PMC8201426/ /pubmed/34164114 http://dx.doi.org/10.12688/f1000research.52350.2 Text en Copyright: © 2021 Ramos TAR et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Software Tool Article Ramos, Thaís A.R. Galindo, Nilbson R.O. Arias-Carrasco, Raúl da Silva, Cecília F. Maracaja-Coutinho, Vinicius do Rêgo, Thaís G. RNAmining: A machine learning stand-alone and web server tool for RNA coding potential prediction |
title | RNAmining: A machine learning stand-alone and web server tool for RNA coding potential prediction |
title_full | RNAmining: A machine learning stand-alone and web server tool for RNA coding potential prediction |
title_fullStr | RNAmining: A machine learning stand-alone and web server tool for RNA coding potential prediction |
title_full_unstemmed | RNAmining: A machine learning stand-alone and web server tool for RNA coding potential prediction |
title_short | RNAmining: A machine learning stand-alone and web server tool for RNA coding potential prediction |
title_sort | rnamining: a machine learning stand-alone and web server tool for rna coding potential prediction |
topic | Software Tool Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8201426/ https://www.ncbi.nlm.nih.gov/pubmed/34164114 http://dx.doi.org/10.12688/f1000research.52350.2 |
work_keys_str_mv | AT ramosthaisar rnaminingamachinelearningstandaloneandwebservertoolforrnacodingpotentialprediction AT galindonilbsonro rnaminingamachinelearningstandaloneandwebservertoolforrnacodingpotentialprediction AT ariascarrascoraul rnaminingamachinelearningstandaloneandwebservertoolforrnacodingpotentialprediction AT dasilvaceciliaf rnaminingamachinelearningstandaloneandwebservertoolforrnacodingpotentialprediction AT maracajacoutinhovinicius rnaminingamachinelearningstandaloneandwebservertoolforrnacodingpotentialprediction AT doregothaisg rnaminingamachinelearningstandaloneandwebservertoolforrnacodingpotentialprediction |