Cargando…

Interpretable Deep Learning Model Reveals Subsequences of Various Functions for Long Non-Coding RNA Identification

Long non-coding RNAs (lncRNAs) play crucial roles in many biological processes and are implicated in several diseases. With the next-generation sequencing technologies, substantial unannotated transcripts have been discovered. Classifying unannotated transcripts using biological experiments are more...

Descripción completa

Detalles Bibliográficos
Autores principales:	Lin, Rattaphon, Wichadakul, Duangdao
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Frontiers Media S.A. 2022
Materias:	Genetics
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9173695/ https://www.ncbi.nlm.nih.gov/pubmed/35685437 http://dx.doi.org/10.3389/fgene.2022.876721

_version_	1784722076499705856
author	Lin, Rattaphon Wichadakul, Duangdao
author_facet	Lin, Rattaphon Wichadakul, Duangdao
author_sort	Lin, Rattaphon
collection	PubMed
description	Long non-coding RNAs (lncRNAs) play crucial roles in many biological processes and are implicated in several diseases. With the next-generation sequencing technologies, substantial unannotated transcripts have been discovered. Classifying unannotated transcripts using biological experiments are more time-consuming and expensive than computational approaches. Several tools are available for identifying long non-coding RNAs. These tools, however, did not explain the features in their tools that contributed to the prediction results. Here, we present Xlnc1DCNN, a tool for distinguishing long non-coding RNAs (lncRNAs) from protein-coding transcripts (PCTs) using a one-dimensional convolutional neural network with prediction explanations. The evaluation results of the human test set showed that Xlnc1DCNN outperformed other state-of-the-art tools in terms of accuracy and F1-score. The explanation results revealed that lncRNA transcripts were mainly identified as sequences with no conserved regions, short patterns with unknown functions, or only regions of transmembrane helices while protein-coding transcripts were mostly classified by conserved protein domains or families. The explanation results also conveyed the probably inconsistent annotations among the public databases, lncRNA transcripts which contain protein domains, protein families, or intrinsically disordered regions (IDRs). Xlnc1DCNN is freely available at https://github.com/cucpbioinfo/Xlnc1DCNN.
format	Online Article Text
id	pubmed-9173695
institution	National Center for Biotechnology Information
language	English
publishDate	2022
publisher	Frontiers Media S.A.
record_format	MEDLINE/PubMed
spelling	pubmed-91736952022-06-08 Interpretable Deep Learning Model Reveals Subsequences of Various Functions for Long Non-Coding RNA Identification Lin, Rattaphon Wichadakul, Duangdao Front Genet Genetics Long non-coding RNAs (lncRNAs) play crucial roles in many biological processes and are implicated in several diseases. With the next-generation sequencing technologies, substantial unannotated transcripts have been discovered. Classifying unannotated transcripts using biological experiments are more time-consuming and expensive than computational approaches. Several tools are available for identifying long non-coding RNAs. These tools, however, did not explain the features in their tools that contributed to the prediction results. Here, we present Xlnc1DCNN, a tool for distinguishing long non-coding RNAs (lncRNAs) from protein-coding transcripts (PCTs) using a one-dimensional convolutional neural network with prediction explanations. The evaluation results of the human test set showed that Xlnc1DCNN outperformed other state-of-the-art tools in terms of accuracy and F1-score. The explanation results revealed that lncRNA transcripts were mainly identified as sequences with no conserved regions, short patterns with unknown functions, or only regions of transmembrane helices while protein-coding transcripts were mostly classified by conserved protein domains or families. The explanation results also conveyed the probably inconsistent annotations among the public databases, lncRNA transcripts which contain protein domains, protein families, or intrinsically disordered regions (IDRs). Xlnc1DCNN is freely available at https://github.com/cucpbioinfo/Xlnc1DCNN. Frontiers Media S.A. 2022-05-24 /pmc/articles/PMC9173695/ /pubmed/35685437 http://dx.doi.org/10.3389/fgene.2022.876721 Text en Copyright © 2022 Lin and Wichadakul. https://creativecommons.org/licenses/by/4.0/This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
spellingShingle	Genetics Lin, Rattaphon Wichadakul, Duangdao Interpretable Deep Learning Model Reveals Subsequences of Various Functions for Long Non-Coding RNA Identification
title	Interpretable Deep Learning Model Reveals Subsequences of Various Functions for Long Non-Coding RNA Identification
title_full	Interpretable Deep Learning Model Reveals Subsequences of Various Functions for Long Non-Coding RNA Identification
title_fullStr	Interpretable Deep Learning Model Reveals Subsequences of Various Functions for Long Non-Coding RNA Identification
title_full_unstemmed	Interpretable Deep Learning Model Reveals Subsequences of Various Functions for Long Non-Coding RNA Identification
title_short	Interpretable Deep Learning Model Reveals Subsequences of Various Functions for Long Non-Coding RNA Identification
title_sort	interpretable deep learning model reveals subsequences of various functions for long non-coding rna identification
topic	Genetics
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9173695/ https://www.ncbi.nlm.nih.gov/pubmed/35685437 http://dx.doi.org/10.3389/fgene.2022.876721
work_keys_str_mv	AT linrattaphon interpretabledeeplearningmodelrevealssubsequencesofvariousfunctionsforlongnoncodingrnaidentification AT wichadakulduangdao interpretabledeeplearningmodelrevealssubsequencesofvariousfunctionsforlongnoncodingrnaidentification

Interpretable Deep Learning Model Reveals Subsequences of Various Functions for Long Non-Coding RNA Identification

Ejemplares similares