Cargando…
RNAsamba: neural network-based assessment of the protein-coding potential of RNA sequences
The advent of high-throughput sequencing technologies made it possible to obtain large volumes of genetic information, quickly and inexpensively. Thus, many efforts are devoted to unveiling the biological roles of genomic elements, being the distinction between protein-coding and long non-coding RNA...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2020
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7671399/ https://www.ncbi.nlm.nih.gov/pubmed/33575571 http://dx.doi.org/10.1093/nargab/lqz024 |
_version_ | 1783610921883533312 |
---|---|
author | Camargo, Antonio P Sourkov, Vsevolod Pereira, Gonçalo A G Carazzolle, Marcelo F |
author_facet | Camargo, Antonio P Sourkov, Vsevolod Pereira, Gonçalo A G Carazzolle, Marcelo F |
author_sort | Camargo, Antonio P |
collection | PubMed |
description | The advent of high-throughput sequencing technologies made it possible to obtain large volumes of genetic information, quickly and inexpensively. Thus, many efforts are devoted to unveiling the biological roles of genomic elements, being the distinction between protein-coding and long non-coding RNAs one of the most important tasks. We describe RNAsamba, a tool to predict the coding potential of RNA molecules from sequence information using a neural network-based that models both the whole sequence and the ORF to identify patterns that distinguish coding from non-coding transcripts. We evaluated RNAsamba’s classification performance using transcripts coming from humans and several other model organisms and show that it recurrently outperforms other state-of-the-art methods. Our results also show that RNAsamba can identify coding signals in partial-length ORFs and UTR sequences, evidencing that its algorithm is not dependent on complete transcript sequences. Furthermore, RNAsamba can also predict small ORFs, traditionally identified with ribosome profiling experiments. We believe that RNAsamba will enable faster and more accurate biological findings from genomic data of species that are being sequenced for the first time. A user-friendly web interface, the documentation containing instructions for local installation and usage, and the source code of RNAsamba can be found at https://rnasamba.lge.ibi.unicamp.br/. |
format | Online Article Text |
id | pubmed-7671399 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2020 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-76713992021-02-10 RNAsamba: neural network-based assessment of the protein-coding potential of RNA sequences Camargo, Antonio P Sourkov, Vsevolod Pereira, Gonçalo A G Carazzolle, Marcelo F NAR Genom Bioinform Methods Article The advent of high-throughput sequencing technologies made it possible to obtain large volumes of genetic information, quickly and inexpensively. Thus, many efforts are devoted to unveiling the biological roles of genomic elements, being the distinction between protein-coding and long non-coding RNAs one of the most important tasks. We describe RNAsamba, a tool to predict the coding potential of RNA molecules from sequence information using a neural network-based that models both the whole sequence and the ORF to identify patterns that distinguish coding from non-coding transcripts. We evaluated RNAsamba’s classification performance using transcripts coming from humans and several other model organisms and show that it recurrently outperforms other state-of-the-art methods. Our results also show that RNAsamba can identify coding signals in partial-length ORFs and UTR sequences, evidencing that its algorithm is not dependent on complete transcript sequences. Furthermore, RNAsamba can also predict small ORFs, traditionally identified with ribosome profiling experiments. We believe that RNAsamba will enable faster and more accurate biological findings from genomic data of species that are being sequenced for the first time. A user-friendly web interface, the documentation containing instructions for local installation and usage, and the source code of RNAsamba can be found at https://rnasamba.lge.ibi.unicamp.br/. Oxford University Press 2020-01-13 /pmc/articles/PMC7671399/ /pubmed/33575571 http://dx.doi.org/10.1093/nargab/lqz024 Text en © The Author(s) 2019. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Methods Article Camargo, Antonio P Sourkov, Vsevolod Pereira, Gonçalo A G Carazzolle, Marcelo F RNAsamba: neural network-based assessment of the protein-coding potential of RNA sequences |
title | RNAsamba: neural network-based assessment of the protein-coding potential of RNA sequences |
title_full | RNAsamba: neural network-based assessment of the protein-coding potential of RNA sequences |
title_fullStr | RNAsamba: neural network-based assessment of the protein-coding potential of RNA sequences |
title_full_unstemmed | RNAsamba: neural network-based assessment of the protein-coding potential of RNA sequences |
title_short | RNAsamba: neural network-based assessment of the protein-coding potential of RNA sequences |
title_sort | rnasamba: neural network-based assessment of the protein-coding potential of rna sequences |
topic | Methods Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7671399/ https://www.ncbi.nlm.nih.gov/pubmed/33575571 http://dx.doi.org/10.1093/nargab/lqz024 |
work_keys_str_mv | AT camargoantoniop rnasambaneuralnetworkbasedassessmentoftheproteincodingpotentialofrnasequences AT sourkovvsevolod rnasambaneuralnetworkbasedassessmentoftheproteincodingpotentialofrnasequences AT pereiragoncaloag rnasambaneuralnetworkbasedassessmentoftheproteincodingpotentialofrnasequences AT carazzollemarcelof rnasambaneuralnetworkbasedassessmentoftheproteincodingpotentialofrnasequences |