Cargando…

Gene2vec: gene subsequence embedding for prediction of mammalian N(6)-methyladenosine sites from mRNA

N(6)-Methyladenosine (m(6)A) refers to methylation modification of the adenosine nucleotide acid at the nitrogen-6 position. Many conventional computational methods for identifying N(6)-methyladenosine sites are limited by the small amount of data available. Taking advantage of the thousands of m(6)...

Descripción completa

Detalles Bibliográficos
Autores principales: Zou, Quan, Xing, Pengwei, Wei, Leyi, Liu, Bin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6348985/
https://www.ncbi.nlm.nih.gov/pubmed/30425123
http://dx.doi.org/10.1261/rna.069112.118
_version_ 1783390206464884736
author Zou, Quan
Xing, Pengwei
Wei, Leyi
Liu, Bin
author_facet Zou, Quan
Xing, Pengwei
Wei, Leyi
Liu, Bin
author_sort Zou, Quan
collection PubMed
description N(6)-Methyladenosine (m(6)A) refers to methylation modification of the adenosine nucleotide acid at the nitrogen-6 position. Many conventional computational methods for identifying N(6)-methyladenosine sites are limited by the small amount of data available. Taking advantage of the thousands of m(6)A sites detected by high-throughput sequencing, it is now possible to discover the characteristics of m(6)A sequences using deep learning techniques. To the best of our knowledge, our work is the first attempt to use word embedding and deep neural networks for m(6)A prediction from mRNA sequences. Using four deep neural networks, we developed a model inferred from a larger sequence shifting window that can predict m(6)A accurately and robustly. Four prediction schemes were built with various RNA sequence representations and optimized convolutional neural networks. The soft voting results from the four deep networks were shown to outperform all of the state-of-the-art methods. We evaluated these predictors mentioned above on a rigorous independent test data set and proved that our proposed method outperforms the state-of-the-art predictors. The training, independent, and cross-species testing data sets are much larger than in previous studies, which could help to avoid the problem of overfitting. Furthermore, an online prediction web server implementing the four proposed predictors has been built and is available at http://server.malab.cn/Gene2vec/.
format Online
Article
Text
id pubmed-6348985
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Cold Spring Harbor Laboratory Press
record_format MEDLINE/PubMed
spelling pubmed-63489852020-02-01 Gene2vec: gene subsequence embedding for prediction of mammalian N(6)-methyladenosine sites from mRNA Zou, Quan Xing, Pengwei Wei, Leyi Liu, Bin RNA Bioinformatics N(6)-Methyladenosine (m(6)A) refers to methylation modification of the adenosine nucleotide acid at the nitrogen-6 position. Many conventional computational methods for identifying N(6)-methyladenosine sites are limited by the small amount of data available. Taking advantage of the thousands of m(6)A sites detected by high-throughput sequencing, it is now possible to discover the characteristics of m(6)A sequences using deep learning techniques. To the best of our knowledge, our work is the first attempt to use word embedding and deep neural networks for m(6)A prediction from mRNA sequences. Using four deep neural networks, we developed a model inferred from a larger sequence shifting window that can predict m(6)A accurately and robustly. Four prediction schemes were built with various RNA sequence representations and optimized convolutional neural networks. The soft voting results from the four deep networks were shown to outperform all of the state-of-the-art methods. We evaluated these predictors mentioned above on a rigorous independent test data set and proved that our proposed method outperforms the state-of-the-art predictors. The training, independent, and cross-species testing data sets are much larger than in previous studies, which could help to avoid the problem of overfitting. Furthermore, an online prediction web server implementing the four proposed predictors has been built and is available at http://server.malab.cn/Gene2vec/. Cold Spring Harbor Laboratory Press 2019-02 /pmc/articles/PMC6348985/ /pubmed/30425123 http://dx.doi.org/10.1261/rna.069112.118 Text en © 2019 Zou et al.; Published by Cold Spring Harbor Laboratory Press for the RNA Society http://creativecommons.org/licenses/by-nc/4.0/ This article is distributed exclusively by the RNA Society for the first 12 months after the full-issue publication date (see http://rnajournal.cshlp.org/site/misc/terms.xhtml). After 12 months, it is available under a Creative Commons License (Attribution-NonCommercial 4.0 International), as described at http://creativecommons.org/licenses/by-nc/4.0/.
spellingShingle Bioinformatics
Zou, Quan
Xing, Pengwei
Wei, Leyi
Liu, Bin
Gene2vec: gene subsequence embedding for prediction of mammalian N(6)-methyladenosine sites from mRNA
title Gene2vec: gene subsequence embedding for prediction of mammalian N(6)-methyladenosine sites from mRNA
title_full Gene2vec: gene subsequence embedding for prediction of mammalian N(6)-methyladenosine sites from mRNA
title_fullStr Gene2vec: gene subsequence embedding for prediction of mammalian N(6)-methyladenosine sites from mRNA
title_full_unstemmed Gene2vec: gene subsequence embedding for prediction of mammalian N(6)-methyladenosine sites from mRNA
title_short Gene2vec: gene subsequence embedding for prediction of mammalian N(6)-methyladenosine sites from mRNA
title_sort gene2vec: gene subsequence embedding for prediction of mammalian n(6)-methyladenosine sites from mrna
topic Bioinformatics
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6348985/
https://www.ncbi.nlm.nih.gov/pubmed/30425123
http://dx.doi.org/10.1261/rna.069112.118
work_keys_str_mv AT zouquan gene2vecgenesubsequenceembeddingforpredictionofmammaliann6methyladenosinesitesfrommrna
AT xingpengwei gene2vecgenesubsequenceembeddingforpredictionofmammaliann6methyladenosinesitesfrommrna
AT weileyi gene2vecgenesubsequenceembeddingforpredictionofmammaliann6methyladenosinesitesfrommrna
AT liubin gene2vecgenesubsequenceembeddingforpredictionofmammaliann6methyladenosinesitesfrommrna