Cargando…

Convolutional neural networks for classification of alignments of non-coding RNA sequences

MOTIVATION: The convolutional neural network (CNN) has been applied to the classification problem of DNA sequences, with the additional purpose of motif discovery. The training of CNNs with distributed representations of four nucleotides has successfully derived position weight matrices on the learn...

Descripción completa

Detalles Bibliográficos
Autores principales: Aoki, Genta, Sakakibara, Yasubumi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6022636/
https://www.ncbi.nlm.nih.gov/pubmed/29949978
http://dx.doi.org/10.1093/bioinformatics/bty228
_version_ 1783335720588410880
author Aoki, Genta
Sakakibara, Yasubumi
author_facet Aoki, Genta
Sakakibara, Yasubumi
author_sort Aoki, Genta
collection PubMed
description MOTIVATION: The convolutional neural network (CNN) has been applied to the classification problem of DNA sequences, with the additional purpose of motif discovery. The training of CNNs with distributed representations of four nucleotides has successfully derived position weight matrices on the learned kernels that corresponded to sequence motifs such as protein-binding sites. RESULTS: We propose a novel application of CNNs to classification of pairwise alignments of sequences for accurate clustering of sequences and show the benefits of the CNN method of inputting pairwise alignments for clustering of non-coding RNA (ncRNA) sequences and for motif discovery. Classification of a pairwise alignment of two sequences into positive and negative classes corresponds to the clustering of the input sequences. After we combined the distributed representation of RNA nucleotides with the secondary-structure information specific to ncRNAs and furthermore with mapping profiles of next-generation sequence reads, the training of CNNs for classification of alignments of RNA sequences yielded accurate clustering in terms of ncRNA families and outperformed the existing clustering methods for ncRNA sequences. Several interesting sequence motifs and secondary-structure motifs known for the snoRNA family and specific to microRNA and tRNA families were identified. AVAILABILITY AND IMPLEMENTATION: The source code of our CNN software in the deep-learning framework Chainer is available at http://www.dna.bio.keio.ac.jp/cnn/, and the dataset used for performance evaluation in this work is available at the same URL.
format Online
Article
Text
id pubmed-6022636
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-60226362018-07-10 Convolutional neural networks for classification of alignments of non-coding RNA sequences Aoki, Genta Sakakibara, Yasubumi Bioinformatics Ismb 2018–Intelligent Systems for Molecular Biology Proceedings MOTIVATION: The convolutional neural network (CNN) has been applied to the classification problem of DNA sequences, with the additional purpose of motif discovery. The training of CNNs with distributed representations of four nucleotides has successfully derived position weight matrices on the learned kernels that corresponded to sequence motifs such as protein-binding sites. RESULTS: We propose a novel application of CNNs to classification of pairwise alignments of sequences for accurate clustering of sequences and show the benefits of the CNN method of inputting pairwise alignments for clustering of non-coding RNA (ncRNA) sequences and for motif discovery. Classification of a pairwise alignment of two sequences into positive and negative classes corresponds to the clustering of the input sequences. After we combined the distributed representation of RNA nucleotides with the secondary-structure information specific to ncRNAs and furthermore with mapping profiles of next-generation sequence reads, the training of CNNs for classification of alignments of RNA sequences yielded accurate clustering in terms of ncRNA families and outperformed the existing clustering methods for ncRNA sequences. Several interesting sequence motifs and secondary-structure motifs known for the snoRNA family and specific to microRNA and tRNA families were identified. AVAILABILITY AND IMPLEMENTATION: The source code of our CNN software in the deep-learning framework Chainer is available at http://www.dna.bio.keio.ac.jp/cnn/, and the dataset used for performance evaluation in this work is available at the same URL. Oxford University Press 2018-07-01 2018-06-27 /pmc/articles/PMC6022636/ /pubmed/29949978 http://dx.doi.org/10.1093/bioinformatics/bty228 Text en © The Author(s) 2018. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Ismb 2018–Intelligent Systems for Molecular Biology Proceedings
Aoki, Genta
Sakakibara, Yasubumi
Convolutional neural networks for classification of alignments of non-coding RNA sequences
title Convolutional neural networks for classification of alignments of non-coding RNA sequences
title_full Convolutional neural networks for classification of alignments of non-coding RNA sequences
title_fullStr Convolutional neural networks for classification of alignments of non-coding RNA sequences
title_full_unstemmed Convolutional neural networks for classification of alignments of non-coding RNA sequences
title_short Convolutional neural networks for classification of alignments of non-coding RNA sequences
title_sort convolutional neural networks for classification of alignments of non-coding rna sequences
topic Ismb 2018–Intelligent Systems for Molecular Biology Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6022636/
https://www.ncbi.nlm.nih.gov/pubmed/29949978
http://dx.doi.org/10.1093/bioinformatics/bty228
work_keys_str_mv AT aokigenta convolutionalneuralnetworksforclassificationofalignmentsofnoncodingrnasequences
AT sakakibarayasubumi convolutionalneuralnetworksforclassificationofalignmentsofnoncodingrnasequences