Cargando…

MncR: Late Integration Machine Learning Model for Classification of ncRNA Classes Using Sequence and Structural Encoding

Non-coding RNA (ncRNA) classes take over important housekeeping and regulatory functions and are quite heterogeneous in terms of length, sequence conservation and secondary structure. High-throughput sequencing reveals that the expressed novel ncRNAs and their classification are important to underst...

Descripción completa

Detalles Bibliográficos
Autores principales: Dunkel, Heiko, Wehrmann, Henning, Jensen, Lars R., Kuss, Andreas W., Simm, Stefan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10218863/
https://www.ncbi.nlm.nih.gov/pubmed/37240230
http://dx.doi.org/10.3390/ijms24108884
_version_ 1785048874726981632
author Dunkel, Heiko
Wehrmann, Henning
Jensen, Lars R.
Kuss, Andreas W.
Simm, Stefan
author_facet Dunkel, Heiko
Wehrmann, Henning
Jensen, Lars R.
Kuss, Andreas W.
Simm, Stefan
author_sort Dunkel, Heiko
collection PubMed
description Non-coding RNA (ncRNA) classes take over important housekeeping and regulatory functions and are quite heterogeneous in terms of length, sequence conservation and secondary structure. High-throughput sequencing reveals that the expressed novel ncRNAs and their classification are important to understand cell regulation and identify potential diagnostic and therapeutic biomarkers. To improve the classification of ncRNAs, we investigated different approaches of utilizing primary sequences and secondary structures as well as the late integration of both using machine learning models, including different neural network architectures. As input, we used the newest version of RNAcentral, focusing on six ncRNA classes, including lncRNA, rRNA, tRNA, miRNA, snRNA and snoRNA. The late integration of graph-encoded structural features and primary sequences in our MncR classifier achieved an overall accuracy of >97%, which could not be increased by more fine-grained subclassification. In comparison to the actual best-performing tool ncRDense, we had a minimal increase of 0.5% in all four overlapping ncRNA classes on a similar test set of sequences. In summary, MncR is not only more accurate than current ncRNA prediction tools but also allows the prediction of long ncRNA classes (lncRNAs, certain rRNAs) up to 12.000 nts and is trained on a more diverse ncRNA dataset retrieved from RNAcentral.
format Online
Article
Text
id pubmed-10218863
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-102188632023-05-27 MncR: Late Integration Machine Learning Model for Classification of ncRNA Classes Using Sequence and Structural Encoding Dunkel, Heiko Wehrmann, Henning Jensen, Lars R. Kuss, Andreas W. Simm, Stefan Int J Mol Sci Article Non-coding RNA (ncRNA) classes take over important housekeeping and regulatory functions and are quite heterogeneous in terms of length, sequence conservation and secondary structure. High-throughput sequencing reveals that the expressed novel ncRNAs and their classification are important to understand cell regulation and identify potential diagnostic and therapeutic biomarkers. To improve the classification of ncRNAs, we investigated different approaches of utilizing primary sequences and secondary structures as well as the late integration of both using machine learning models, including different neural network architectures. As input, we used the newest version of RNAcentral, focusing on six ncRNA classes, including lncRNA, rRNA, tRNA, miRNA, snRNA and snoRNA. The late integration of graph-encoded structural features and primary sequences in our MncR classifier achieved an overall accuracy of >97%, which could not be increased by more fine-grained subclassification. In comparison to the actual best-performing tool ncRDense, we had a minimal increase of 0.5% in all four overlapping ncRNA classes on a similar test set of sequences. In summary, MncR is not only more accurate than current ncRNA prediction tools but also allows the prediction of long ncRNA classes (lncRNAs, certain rRNAs) up to 12.000 nts and is trained on a more diverse ncRNA dataset retrieved from RNAcentral. MDPI 2023-05-17 /pmc/articles/PMC10218863/ /pubmed/37240230 http://dx.doi.org/10.3390/ijms24108884 Text en © 2023 by the authors. https://creativecommons.org/licenses/by/4.0/Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
spellingShingle Article
Dunkel, Heiko
Wehrmann, Henning
Jensen, Lars R.
Kuss, Andreas W.
Simm, Stefan
MncR: Late Integration Machine Learning Model for Classification of ncRNA Classes Using Sequence and Structural Encoding
title MncR: Late Integration Machine Learning Model for Classification of ncRNA Classes Using Sequence and Structural Encoding
title_full MncR: Late Integration Machine Learning Model for Classification of ncRNA Classes Using Sequence and Structural Encoding
title_fullStr MncR: Late Integration Machine Learning Model for Classification of ncRNA Classes Using Sequence and Structural Encoding
title_full_unstemmed MncR: Late Integration Machine Learning Model for Classification of ncRNA Classes Using Sequence and Structural Encoding
title_short MncR: Late Integration Machine Learning Model for Classification of ncRNA Classes Using Sequence and Structural Encoding
title_sort mncr: late integration machine learning model for classification of ncrna classes using sequence and structural encoding
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10218863/
https://www.ncbi.nlm.nih.gov/pubmed/37240230
http://dx.doi.org/10.3390/ijms24108884
work_keys_str_mv AT dunkelheiko mncrlateintegrationmachinelearningmodelforclassificationofncrnaclassesusingsequenceandstructuralencoding
AT wehrmannhenning mncrlateintegrationmachinelearningmodelforclassificationofncrnaclassesusingsequenceandstructuralencoding
AT jensenlarsr mncrlateintegrationmachinelearningmodelforclassificationofncrnaclassesusingsequenceandstructuralencoding
AT kussandreasw mncrlateintegrationmachinelearningmodelforclassificationofncrnaclassesusingsequenceandstructuralencoding
AT simmstefan mncrlateintegrationmachinelearningmodelforclassificationofncrnaclassesusingsequenceandstructuralencoding