Cargando…

Multi-branch Convolutional Neural Network for Identification of Small Non-coding RNA genomic loci

Genomic regions that encode small RNA genes exhibit characteristic patterns in their sequence, secondary structure, and evolutionary conservation. Convolutional Neural Networks are a family of algorithms that can classify data based on learned patterns. Here we present MuStARD an application of Conv...

Descripción completa

Detalles Bibliográficos
Autores principales: Georgakilas, Georgios K., Grioni, Andrea, Liakos, Konstantinos G., Chalupova, Eliska, Plessas, Fotis C., Alexiou, Panagiotis
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Nature Publishing Group UK 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7289789/
https://www.ncbi.nlm.nih.gov/pubmed/32528107
http://dx.doi.org/10.1038/s41598-020-66454-3
_version_ 1783545529858260992
author Georgakilas, Georgios K.
Grioni, Andrea
Liakos, Konstantinos G.
Chalupova, Eliska
Plessas, Fotis C.
Alexiou, Panagiotis
author_facet Georgakilas, Georgios K.
Grioni, Andrea
Liakos, Konstantinos G.
Chalupova, Eliska
Plessas, Fotis C.
Alexiou, Panagiotis
author_sort Georgakilas, Georgios K.
collection PubMed
description Genomic regions that encode small RNA genes exhibit characteristic patterns in their sequence, secondary structure, and evolutionary conservation. Convolutional Neural Networks are a family of algorithms that can classify data based on learned patterns. Here we present MuStARD an application of Convolutional Neural Networks that can learn patterns associated with user-defined sets of genomic regions, and scan large genomic areas for novel regions exhibiting similar characteristics. We demonstrate that MuStARD is a generic method that can be trained on different classes of human small RNA genomic loci, without need for domain specific knowledge, due to the automated feature and background selection processes built into the model. We also demonstrate the ability of MuStARD for inter-species identification of functional elements by predicting mouse small RNAs (pre-miRNAs and snoRNAs) using models trained on the human genome. MuStARD can be used to filter small RNA-Seq datasets for identification of novel small RNA loci, intra- and inter- species, as demonstrated in three use cases of human, mouse, and fly pre-miRNA prediction. MuStARD is easy to deploy and extend to a variety of genomic classification questions. Code and trained models are freely available at gitlab.com/RBP_Bioinformatics/mustard.
format Online
Article
Text
id pubmed-7289789
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher Nature Publishing Group UK
record_format MEDLINE/PubMed
spelling pubmed-72897892020-06-15 Multi-branch Convolutional Neural Network for Identification of Small Non-coding RNA genomic loci Georgakilas, Georgios K. Grioni, Andrea Liakos, Konstantinos G. Chalupova, Eliska Plessas, Fotis C. Alexiou, Panagiotis Sci Rep Article Genomic regions that encode small RNA genes exhibit characteristic patterns in their sequence, secondary structure, and evolutionary conservation. Convolutional Neural Networks are a family of algorithms that can classify data based on learned patterns. Here we present MuStARD an application of Convolutional Neural Networks that can learn patterns associated with user-defined sets of genomic regions, and scan large genomic areas for novel regions exhibiting similar characteristics. We demonstrate that MuStARD is a generic method that can be trained on different classes of human small RNA genomic loci, without need for domain specific knowledge, due to the automated feature and background selection processes built into the model. We also demonstrate the ability of MuStARD for inter-species identification of functional elements by predicting mouse small RNAs (pre-miRNAs and snoRNAs) using models trained on the human genome. MuStARD can be used to filter small RNA-Seq datasets for identification of novel small RNA loci, intra- and inter- species, as demonstrated in three use cases of human, mouse, and fly pre-miRNA prediction. MuStARD is easy to deploy and extend to a variety of genomic classification questions. Code and trained models are freely available at gitlab.com/RBP_Bioinformatics/mustard. Nature Publishing Group UK 2020-06-11 /pmc/articles/PMC7289789/ /pubmed/32528107 http://dx.doi.org/10.1038/s41598-020-66454-3 Text en © The Author(s) 2020 Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this license, visit http://creativecommons.org/licenses/by/4.0/.
spellingShingle Article
Georgakilas, Georgios K.
Grioni, Andrea
Liakos, Konstantinos G.
Chalupova, Eliska
Plessas, Fotis C.
Alexiou, Panagiotis
Multi-branch Convolutional Neural Network for Identification of Small Non-coding RNA genomic loci
title Multi-branch Convolutional Neural Network for Identification of Small Non-coding RNA genomic loci
title_full Multi-branch Convolutional Neural Network for Identification of Small Non-coding RNA genomic loci
title_fullStr Multi-branch Convolutional Neural Network for Identification of Small Non-coding RNA genomic loci
title_full_unstemmed Multi-branch Convolutional Neural Network for Identification of Small Non-coding RNA genomic loci
title_short Multi-branch Convolutional Neural Network for Identification of Small Non-coding RNA genomic loci
title_sort multi-branch convolutional neural network for identification of small non-coding rna genomic loci
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7289789/
https://www.ncbi.nlm.nih.gov/pubmed/32528107
http://dx.doi.org/10.1038/s41598-020-66454-3
work_keys_str_mv AT georgakilasgeorgiosk multibranchconvolutionalneuralnetworkforidentificationofsmallnoncodingrnagenomicloci
AT grioniandrea multibranchconvolutionalneuralnetworkforidentificationofsmallnoncodingrnagenomicloci
AT liakoskonstantinosg multibranchconvolutionalneuralnetworkforidentificationofsmallnoncodingrnagenomicloci
AT chalupovaeliska multibranchconvolutionalneuralnetworkforidentificationofsmallnoncodingrnagenomicloci
AT plessasfotisc multibranchconvolutionalneuralnetworkforidentificationofsmallnoncodingrnagenomicloci
AT alexioupanagiotis multibranchconvolutionalneuralnetworkforidentificationofsmallnoncodingrnagenomicloci