Cargando…

Discovering misannotated lncRNAs using deep learning training dynamics

MOTIVATION: Recent experimental evidence has shown that some long non-coding RNAs (lncRNAs) contain small open reading frames (sORFs) that are translated into functional micropeptides, suggesting that these lncRNAs are misannotated as non-coding. Current methods to detect misannotated lncRNAs rely o...

Descripción completa

Detalles Bibliográficos
Autores principales: Nabi, Afshan, Dilekoglu, Berke, Adebali, Ogun, Tastan, Oznur
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9825752/
https://www.ncbi.nlm.nih.gov/pubmed/36571493
http://dx.doi.org/10.1093/bioinformatics/btac821
_version_ 1784866690441412608
author Nabi, Afshan
Dilekoglu, Berke
Adebali, Ogun
Tastan, Oznur
author_facet Nabi, Afshan
Dilekoglu, Berke
Adebali, Ogun
Tastan, Oznur
author_sort Nabi, Afshan
collection PubMed
description MOTIVATION: Recent experimental evidence has shown that some long non-coding RNAs (lncRNAs) contain small open reading frames (sORFs) that are translated into functional micropeptides, suggesting that these lncRNAs are misannotated as non-coding. Current methods to detect misannotated lncRNAs rely on ribosome-profiling (Ribo-Seq) and mass-spectrometry experiments, which are cell-type dependent and expensive. RESULTS: Here, we propose a computational method to identify possible misannotated lncRNAs from sequence information alone. Our approach first builds deep learning models to discriminate coding and non-coding transcripts and leverages these models’ training dynamics to identify misannotated lncRNAs—i.e. lncRNAs with coding potential. The set of misannotated lncRNAs we identified significantly overlap with experimentally validated ones and closely resemble coding protein sequences as evidenced by significant BLAST hits. Our analysis on a subset of misannotated lncRNA candidates also shows that some ORFs they contain yield high confidence folded structures as predicted by AlphaFold2. This methodology offers promising potential for assisting experimental efforts in characterizing the hidden proteome encoded by misannotated lncRNAs and for curating better datasets for building coding potential predictors. AVAILABILITY AND IMPLEMENTATION: Source code is available at https://github.com/nabiafshan/DetectingMisannotatedLncRNAs. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-9825752
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-98257522023-01-10 Discovering misannotated lncRNAs using deep learning training dynamics Nabi, Afshan Dilekoglu, Berke Adebali, Ogun Tastan, Oznur Bioinformatics Original Paper MOTIVATION: Recent experimental evidence has shown that some long non-coding RNAs (lncRNAs) contain small open reading frames (sORFs) that are translated into functional micropeptides, suggesting that these lncRNAs are misannotated as non-coding. Current methods to detect misannotated lncRNAs rely on ribosome-profiling (Ribo-Seq) and mass-spectrometry experiments, which are cell-type dependent and expensive. RESULTS: Here, we propose a computational method to identify possible misannotated lncRNAs from sequence information alone. Our approach first builds deep learning models to discriminate coding and non-coding transcripts and leverages these models’ training dynamics to identify misannotated lncRNAs—i.e. lncRNAs with coding potential. The set of misannotated lncRNAs we identified significantly overlap with experimentally validated ones and closely resemble coding protein sequences as evidenced by significant BLAST hits. Our analysis on a subset of misannotated lncRNA candidates also shows that some ORFs they contain yield high confidence folded structures as predicted by AlphaFold2. This methodology offers promising potential for assisting experimental efforts in characterizing the hidden proteome encoded by misannotated lncRNAs and for curating better datasets for building coding potential predictors. AVAILABILITY AND IMPLEMENTATION: Source code is available at https://github.com/nabiafshan/DetectingMisannotatedLncRNAs. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2022-12-26 /pmc/articles/PMC9825752/ /pubmed/36571493 http://dx.doi.org/10.1093/bioinformatics/btac821 Text en © The Author(s) 2022. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Paper
Nabi, Afshan
Dilekoglu, Berke
Adebali, Ogun
Tastan, Oznur
Discovering misannotated lncRNAs using deep learning training dynamics
title Discovering misannotated lncRNAs using deep learning training dynamics
title_full Discovering misannotated lncRNAs using deep learning training dynamics
title_fullStr Discovering misannotated lncRNAs using deep learning training dynamics
title_full_unstemmed Discovering misannotated lncRNAs using deep learning training dynamics
title_short Discovering misannotated lncRNAs using deep learning training dynamics
title_sort discovering misannotated lncrnas using deep learning training dynamics
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9825752/
https://www.ncbi.nlm.nih.gov/pubmed/36571493
http://dx.doi.org/10.1093/bioinformatics/btac821
work_keys_str_mv AT nabiafshan discoveringmisannotatedlncrnasusingdeeplearningtrainingdynamics
AT dilekogluberke discoveringmisannotatedlncrnasusingdeeplearningtrainingdynamics
AT adebaliogun discoveringmisannotatedlncrnasusingdeeplearningtrainingdynamics
AT tastanoznur discoveringmisannotatedlncrnasusingdeeplearningtrainingdynamics