Cargando…

RNAsamba: neural network-based assessment of the protein-coding potential of RNA sequences

The advent of high-throughput sequencing technologies made it possible to obtain large volumes of genetic information, quickly and inexpensively. Thus, many efforts are devoted to unveiling the biological roles of genomic elements, being the distinction between protein-coding and long non-coding RNA...

Descripción completa

Detalles Bibliográficos
Autores principales: Camargo, Antonio P, Sourkov, Vsevolod, Pereira, Gonçalo A G, Carazzolle, Marcelo F
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7671399/
https://www.ncbi.nlm.nih.gov/pubmed/33575571
http://dx.doi.org/10.1093/nargab/lqz024
_version_ 1783610921883533312
author Camargo, Antonio P
Sourkov, Vsevolod
Pereira, Gonçalo A G
Carazzolle, Marcelo F
author_facet Camargo, Antonio P
Sourkov, Vsevolod
Pereira, Gonçalo A G
Carazzolle, Marcelo F
author_sort Camargo, Antonio P
collection PubMed
description The advent of high-throughput sequencing technologies made it possible to obtain large volumes of genetic information, quickly and inexpensively. Thus, many efforts are devoted to unveiling the biological roles of genomic elements, being the distinction between protein-coding and long non-coding RNAs one of the most important tasks. We describe RNAsamba, a tool to predict the coding potential of RNA molecules from sequence information using a neural network-based that models both the whole sequence and the ORF to identify patterns that distinguish coding from non-coding transcripts. We evaluated RNAsamba’s classification performance using transcripts coming from humans and several other model organisms and show that it recurrently outperforms other state-of-the-art methods. Our results also show that RNAsamba can identify coding signals in partial-length ORFs and UTR sequences, evidencing that its algorithm is not dependent on complete transcript sequences. Furthermore, RNAsamba can also predict small ORFs, traditionally identified with ribosome profiling experiments. We believe that RNAsamba will enable faster and more accurate biological findings from genomic data of species that are being sequenced for the first time. A user-friendly web interface, the documentation containing instructions for local installation and usage, and the source code of RNAsamba can be found at https://rnasamba.lge.ibi.unicamp.br/.
format Online
Article
Text
id pubmed-7671399
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-76713992021-02-10 RNAsamba: neural network-based assessment of the protein-coding potential of RNA sequences Camargo, Antonio P Sourkov, Vsevolod Pereira, Gonçalo A G Carazzolle, Marcelo F NAR Genom Bioinform Methods Article The advent of high-throughput sequencing technologies made it possible to obtain large volumes of genetic information, quickly and inexpensively. Thus, many efforts are devoted to unveiling the biological roles of genomic elements, being the distinction between protein-coding and long non-coding RNAs one of the most important tasks. We describe RNAsamba, a tool to predict the coding potential of RNA molecules from sequence information using a neural network-based that models both the whole sequence and the ORF to identify patterns that distinguish coding from non-coding transcripts. We evaluated RNAsamba’s classification performance using transcripts coming from humans and several other model organisms and show that it recurrently outperforms other state-of-the-art methods. Our results also show that RNAsamba can identify coding signals in partial-length ORFs and UTR sequences, evidencing that its algorithm is not dependent on complete transcript sequences. Furthermore, RNAsamba can also predict small ORFs, traditionally identified with ribosome profiling experiments. We believe that RNAsamba will enable faster and more accurate biological findings from genomic data of species that are being sequenced for the first time. A user-friendly web interface, the documentation containing instructions for local installation and usage, and the source code of RNAsamba can be found at https://rnasamba.lge.ibi.unicamp.br/. Oxford University Press 2020-01-13 /pmc/articles/PMC7671399/ /pubmed/33575571 http://dx.doi.org/10.1093/nargab/lqz024 Text en © The Author(s) 2019. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Methods Article
Camargo, Antonio P
Sourkov, Vsevolod
Pereira, Gonçalo A G
Carazzolle, Marcelo F
RNAsamba: neural network-based assessment of the protein-coding potential of RNA sequences
title RNAsamba: neural network-based assessment of the protein-coding potential of RNA sequences
title_full RNAsamba: neural network-based assessment of the protein-coding potential of RNA sequences
title_fullStr RNAsamba: neural network-based assessment of the protein-coding potential of RNA sequences
title_full_unstemmed RNAsamba: neural network-based assessment of the protein-coding potential of RNA sequences
title_short RNAsamba: neural network-based assessment of the protein-coding potential of RNA sequences
title_sort rnasamba: neural network-based assessment of the protein-coding potential of rna sequences
topic Methods Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7671399/
https://www.ncbi.nlm.nih.gov/pubmed/33575571
http://dx.doi.org/10.1093/nargab/lqz024
work_keys_str_mv AT camargoantoniop rnasambaneuralnetworkbasedassessmentoftheproteincodingpotentialofrnasequences
AT sourkovvsevolod rnasambaneuralnetworkbasedassessmentoftheproteincodingpotentialofrnasequences
AT pereiragoncaloag rnasambaneuralnetworkbasedassessmentoftheproteincodingpotentialofrnasequences
AT carazzollemarcelof rnasambaneuralnetworkbasedassessmentoftheproteincodingpotentialofrnasequences