Cargando…

A deep learning method for lincRNA detection using auto-encoder algorithm

BACKGROUND: RNA sequencing technique (RNA-seq) enables scientists to develop novel data-driven methods for discovering more unidentified lincRNAs. Meantime, knowledge-based technologies are experiencing a potential revolution ignited by the new deep learning methods. By scanning the newly found data...

Descripción completa

Detalles Bibliográficos
Autores principales: Yu, Ning, Yu, Zeng, Pan, Yi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5731497/
https://www.ncbi.nlm.nih.gov/pubmed/29244011
http://dx.doi.org/10.1186/s12859-017-1922-3
_version_ 1783286521589137408
author Yu, Ning
Yu, Zeng
Pan, Yi
author_facet Yu, Ning
Yu, Zeng
Pan, Yi
author_sort Yu, Ning
collection PubMed
description BACKGROUND: RNA sequencing technique (RNA-seq) enables scientists to develop novel data-driven methods for discovering more unidentified lincRNAs. Meantime, knowledge-based technologies are experiencing a potential revolution ignited by the new deep learning methods. By scanning the newly found data set from RNA-seq, scientists have found that: (1) the expression of lincRNAs appears to be regulated, that is, the relevance exists along the DNA sequences; (2) lincRNAs contain some conversed patterns/motifs tethered together by non-conserved regions. The two evidences give the reasoning for adopting knowledge-based deep learning methods in lincRNA detection. Similar to coding region transcription, non-coding regions are split at transcriptional sites. However, regulatory RNAs rather than message RNAs are generated. That is, the transcribed RNAs participate the biological process as regulatory units instead of generating proteins. Identifying these transcriptional regions from non-coding regions is the first step towards lincRNA recognition. RESULTS: The auto-encoder method achieves 100% and 92.4% prediction accuracy on transcription sites over the putative data sets. The experimental results also show the excellent performance of predictive deep neural network on the lincRNA data sets compared with support vector machine and traditional neural network. In addition, it is validated through the newly discovered lincRNA data set and one unreported transcription site is found by feeding the whole annotated sequences through the deep learning machine, which indicates that deep learning method has the extensive ability for lincRNA prediction. CONCLUSIONS: The transcriptional sequences of lincRNAs are collected from the annotated human DNA genome data. Subsequently, a two-layer deep neural network is developed for the lincRNA detection, which adopts the auto-encoder algorithm and utilizes different encoding schemes to obtain the best performance over intergenic DNA sequence data. Driven by those newly annotated lincRNA data, deep learning methods based on auto-encoder algorithm can exert their capability in knowledge learning in order to capture the useful features and the information correlation along DNA genome sequences for lincRNA detection. As our knowledge, this is the first application to adopt the deep learning techniques for identifying lincRNA transcription sequences.
format Online
Article
Text
id pubmed-5731497
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-57314972017-12-19 A deep learning method for lincRNA detection using auto-encoder algorithm Yu, Ning Yu, Zeng Pan, Yi BMC Bioinformatics Research BACKGROUND: RNA sequencing technique (RNA-seq) enables scientists to develop novel data-driven methods for discovering more unidentified lincRNAs. Meantime, knowledge-based technologies are experiencing a potential revolution ignited by the new deep learning methods. By scanning the newly found data set from RNA-seq, scientists have found that: (1) the expression of lincRNAs appears to be regulated, that is, the relevance exists along the DNA sequences; (2) lincRNAs contain some conversed patterns/motifs tethered together by non-conserved regions. The two evidences give the reasoning for adopting knowledge-based deep learning methods in lincRNA detection. Similar to coding region transcription, non-coding regions are split at transcriptional sites. However, regulatory RNAs rather than message RNAs are generated. That is, the transcribed RNAs participate the biological process as regulatory units instead of generating proteins. Identifying these transcriptional regions from non-coding regions is the first step towards lincRNA recognition. RESULTS: The auto-encoder method achieves 100% and 92.4% prediction accuracy on transcription sites over the putative data sets. The experimental results also show the excellent performance of predictive deep neural network on the lincRNA data sets compared with support vector machine and traditional neural network. In addition, it is validated through the newly discovered lincRNA data set and one unreported transcription site is found by feeding the whole annotated sequences through the deep learning machine, which indicates that deep learning method has the extensive ability for lincRNA prediction. CONCLUSIONS: The transcriptional sequences of lincRNAs are collected from the annotated human DNA genome data. Subsequently, a two-layer deep neural network is developed for the lincRNA detection, which adopts the auto-encoder algorithm and utilizes different encoding schemes to obtain the best performance over intergenic DNA sequence data. Driven by those newly annotated lincRNA data, deep learning methods based on auto-encoder algorithm can exert their capability in knowledge learning in order to capture the useful features and the information correlation along DNA genome sequences for lincRNA detection. As our knowledge, this is the first application to adopt the deep learning techniques for identifying lincRNA transcription sequences. BioMed Central 2017-12-06 /pmc/articles/PMC5731497/ /pubmed/29244011 http://dx.doi.org/10.1186/s12859-017-1922-3 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Yu, Ning
Yu, Zeng
Pan, Yi
A deep learning method for lincRNA detection using auto-encoder algorithm
title A deep learning method for lincRNA detection using auto-encoder algorithm
title_full A deep learning method for lincRNA detection using auto-encoder algorithm
title_fullStr A deep learning method for lincRNA detection using auto-encoder algorithm
title_full_unstemmed A deep learning method for lincRNA detection using auto-encoder algorithm
title_short A deep learning method for lincRNA detection using auto-encoder algorithm
title_sort deep learning method for lincrna detection using auto-encoder algorithm
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5731497/
https://www.ncbi.nlm.nih.gov/pubmed/29244011
http://dx.doi.org/10.1186/s12859-017-1922-3
work_keys_str_mv AT yuning adeeplearningmethodforlincrnadetectionusingautoencoderalgorithm
AT yuzeng adeeplearningmethodforlincrnadetectionusingautoencoderalgorithm
AT panyi adeeplearningmethodforlincrnadetectionusingautoencoderalgorithm
AT yuning deeplearningmethodforlincrnadetectionusingautoencoderalgorithm
AT yuzeng deeplearningmethodforlincrnadetectionusingautoencoderalgorithm
AT panyi deeplearningmethodforlincrnadetectionusingautoencoderalgorithm