Cargando…

Sequence-specific bias correction for RNA-seq data using recurrent neural networks

BACKGROUND: The recent success of deep learning techniques in machine learning and artificial intelligence has stimulated a great deal of interest among bioinformaticians, who now wish to bring the power of deep learning to bare on a host of bioinformatical problems. Deep learning is ideally suited...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhang, Yao-zhong, Yamaguchi, Rui, Imoto, Seiya, Miyano, Satoru
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5310274/
https://www.ncbi.nlm.nih.gov/pubmed/28198674
http://dx.doi.org/10.1186/s12864-016-3262-5
Descripción
Sumario:BACKGROUND: The recent success of deep learning techniques in machine learning and artificial intelligence has stimulated a great deal of interest among bioinformaticians, who now wish to bring the power of deep learning to bare on a host of bioinformatical problems. Deep learning is ideally suited for biological problems that require automatic or hierarchical feature representation for biological data when prior knowledge is limited. In this work, we address the sequence-specific bias correction problem for RNA-seq data redusing Recurrent Neural Networks (RNNs) to model nucleotide sequences without pre-determining sequence structures. The sequence-specific bias of a read is then calculated based on the sequence probabilities estimated by RNNs, and used in the estimation of gene abundance. RESULT: We explore the application of two popular RNN recurrent units for this task and demonstrate that RNN-based approaches provide a flexible way to model nucleotide sequences without knowledge of predetermined sequence structures. Our experiments show that training a RNN-based nucleotide sequence model is efficient and RNN-based bias correction methods compare well with the-state-of-the-art sequence-specific bias correction method on the commonly used MAQC-III data set. CONCLUSTIONS: RNNs provides an alternative and flexible way to calculate sequence-specific bias without explicitly pre-determining sequence structures. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12864-016-3262-5) contains supplementary material, which is available to authorized users.