Cargando…

RNAmining: A machine learning stand-alone and web server tool for RNA coding potential prediction

Non-coding RNAs (ncRNAs) are important players in the cellular regulation of organisms from different kingdoms. One of the key steps in ncRNAs research is the ability to distinguish coding/non-coding sequences. We applied seven machine learning algorithms (Naive Bayes, Support Vector Machine, K-Near...

Descripción completa

Detalles Bibliográficos
Autores principales: Ramos, Thaís A.R., Galindo, Nilbson R.O., Arias-Carrasco, Raúl, da Silva, Cecília F., Maracaja-Coutinho, Vinicius, do Rêgo, Thaís G.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: F1000 Research Limited 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8201426/
https://www.ncbi.nlm.nih.gov/pubmed/34164114
http://dx.doi.org/10.12688/f1000research.52350.2
_version_ 1783707814059835392
author Ramos, Thaís A.R.
Galindo, Nilbson R.O.
Arias-Carrasco, Raúl
da Silva, Cecília F.
Maracaja-Coutinho, Vinicius
do Rêgo, Thaís G.
author_facet Ramos, Thaís A.R.
Galindo, Nilbson R.O.
Arias-Carrasco, Raúl
da Silva, Cecília F.
Maracaja-Coutinho, Vinicius
do Rêgo, Thaís G.
author_sort Ramos, Thaís A.R.
collection PubMed
description Non-coding RNAs (ncRNAs) are important players in the cellular regulation of organisms from different kingdoms. One of the key steps in ncRNAs research is the ability to distinguish coding/non-coding sequences. We applied seven machine learning algorithms (Naive Bayes, Support Vector Machine, K-Nearest Neighbors, Random Forest, Extreme Gradient Boosting, Neural Networks and Deep Learning) through model organisms from different evolutionary branches to create a stand-alone and web server tool (RNAmining) to distinguish coding and non-coding sequences. Firstly, we used coding/non-coding sequences downloaded from Ensembl (April 14th, 2020). Then, coding/non-coding sequences were balanced, had their trinucleotides count analysed (64 features) and we performed a normalization by the sequence length, resulting in total of 180 models. The machine learning algorithms validations were performed using 10-fold cross-validation and we selected the algorithm with the best results (eXtreme Gradient Boosting) to implement at RNAmining. Best F1-scores ranged from 97.56% to 99.57% depending on the organism. Moreover, we produced a benchmarking with other tools already in literature (CPAT, CPC2, RNAcon and TransDecoder) and our results outperformed them. Both stand-alone and web server versions of RNAmining are freely available at https://rnamining.integrativebioinformatics.me/.
format Online
Article
Text
id pubmed-8201426
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher F1000 Research Limited
record_format MEDLINE/PubMed
spelling pubmed-82014262021-06-22 RNAmining: A machine learning stand-alone and web server tool for RNA coding potential prediction Ramos, Thaís A.R. Galindo, Nilbson R.O. Arias-Carrasco, Raúl da Silva, Cecília F. Maracaja-Coutinho, Vinicius do Rêgo, Thaís G. F1000Res Software Tool Article Non-coding RNAs (ncRNAs) are important players in the cellular regulation of organisms from different kingdoms. One of the key steps in ncRNAs research is the ability to distinguish coding/non-coding sequences. We applied seven machine learning algorithms (Naive Bayes, Support Vector Machine, K-Nearest Neighbors, Random Forest, Extreme Gradient Boosting, Neural Networks and Deep Learning) through model organisms from different evolutionary branches to create a stand-alone and web server tool (RNAmining) to distinguish coding and non-coding sequences. Firstly, we used coding/non-coding sequences downloaded from Ensembl (April 14th, 2020). Then, coding/non-coding sequences were balanced, had their trinucleotides count analysed (64 features) and we performed a normalization by the sequence length, resulting in total of 180 models. The machine learning algorithms validations were performed using 10-fold cross-validation and we selected the algorithm with the best results (eXtreme Gradient Boosting) to implement at RNAmining. Best F1-scores ranged from 97.56% to 99.57% depending on the organism. Moreover, we produced a benchmarking with other tools already in literature (CPAT, CPC2, RNAcon and TransDecoder) and our results outperformed them. Both stand-alone and web server versions of RNAmining are freely available at https://rnamining.integrativebioinformatics.me/. F1000 Research Limited 2021-06-08 /pmc/articles/PMC8201426/ /pubmed/34164114 http://dx.doi.org/10.12688/f1000research.52350.2 Text en Copyright: © 2021 Ramos TAR et al. https://creativecommons.org/licenses/by/4.0/This is an open access article distributed under the terms of the Creative Commons Attribution Licence, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Software Tool Article
Ramos, Thaís A.R.
Galindo, Nilbson R.O.
Arias-Carrasco, Raúl
da Silva, Cecília F.
Maracaja-Coutinho, Vinicius
do Rêgo, Thaís G.
RNAmining: A machine learning stand-alone and web server tool for RNA coding potential prediction
title RNAmining: A machine learning stand-alone and web server tool for RNA coding potential prediction
title_full RNAmining: A machine learning stand-alone and web server tool for RNA coding potential prediction
title_fullStr RNAmining: A machine learning stand-alone and web server tool for RNA coding potential prediction
title_full_unstemmed RNAmining: A machine learning stand-alone and web server tool for RNA coding potential prediction
title_short RNAmining: A machine learning stand-alone and web server tool for RNA coding potential prediction
title_sort rnamining: a machine learning stand-alone and web server tool for rna coding potential prediction
topic Software Tool Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8201426/
https://www.ncbi.nlm.nih.gov/pubmed/34164114
http://dx.doi.org/10.12688/f1000research.52350.2
work_keys_str_mv AT ramosthaisar rnaminingamachinelearningstandaloneandwebservertoolforrnacodingpotentialprediction
AT galindonilbsonro rnaminingamachinelearningstandaloneandwebservertoolforrnacodingpotentialprediction
AT ariascarrascoraul rnaminingamachinelearningstandaloneandwebservertoolforrnacodingpotentialprediction
AT dasilvaceciliaf rnaminingamachinelearningstandaloneandwebservertoolforrnacodingpotentialprediction
AT maracajacoutinhovinicius rnaminingamachinelearningstandaloneandwebservertoolforrnacodingpotentialprediction
AT doregothaisg rnaminingamachinelearningstandaloneandwebservertoolforrnacodingpotentialprediction