Cargando…
A deep learning method for lincRNA detection using auto-encoder algorithm
BACKGROUND: RNA sequencing technique (RNA-seq) enables scientists to develop novel data-driven methods for discovering more unidentified lincRNAs. Meantime, knowledge-based technologies are experiencing a potential revolution ignited by the new deep learning methods. By scanning the newly found data...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5731497/ https://www.ncbi.nlm.nih.gov/pubmed/29244011 http://dx.doi.org/10.1186/s12859-017-1922-3 |
_version_ | 1783286521589137408 |
---|---|
author | Yu, Ning Yu, Zeng Pan, Yi |
author_facet | Yu, Ning Yu, Zeng Pan, Yi |
author_sort | Yu, Ning |
collection | PubMed |
description | BACKGROUND: RNA sequencing technique (RNA-seq) enables scientists to develop novel data-driven methods for discovering more unidentified lincRNAs. Meantime, knowledge-based technologies are experiencing a potential revolution ignited by the new deep learning methods. By scanning the newly found data set from RNA-seq, scientists have found that: (1) the expression of lincRNAs appears to be regulated, that is, the relevance exists along the DNA sequences; (2) lincRNAs contain some conversed patterns/motifs tethered together by non-conserved regions. The two evidences give the reasoning for adopting knowledge-based deep learning methods in lincRNA detection. Similar to coding region transcription, non-coding regions are split at transcriptional sites. However, regulatory RNAs rather than message RNAs are generated. That is, the transcribed RNAs participate the biological process as regulatory units instead of generating proteins. Identifying these transcriptional regions from non-coding regions is the first step towards lincRNA recognition. RESULTS: The auto-encoder method achieves 100% and 92.4% prediction accuracy on transcription sites over the putative data sets. The experimental results also show the excellent performance of predictive deep neural network on the lincRNA data sets compared with support vector machine and traditional neural network. In addition, it is validated through the newly discovered lincRNA data set and one unreported transcription site is found by feeding the whole annotated sequences through the deep learning machine, which indicates that deep learning method has the extensive ability for lincRNA prediction. CONCLUSIONS: The transcriptional sequences of lincRNAs are collected from the annotated human DNA genome data. Subsequently, a two-layer deep neural network is developed for the lincRNA detection, which adopts the auto-encoder algorithm and utilizes different encoding schemes to obtain the best performance over intergenic DNA sequence data. Driven by those newly annotated lincRNA data, deep learning methods based on auto-encoder algorithm can exert their capability in knowledge learning in order to capture the useful features and the information correlation along DNA genome sequences for lincRNA detection. As our knowledge, this is the first application to adopt the deep learning techniques for identifying lincRNA transcription sequences. |
format | Online Article Text |
id | pubmed-5731497 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-57314972017-12-19 A deep learning method for lincRNA detection using auto-encoder algorithm Yu, Ning Yu, Zeng Pan, Yi BMC Bioinformatics Research BACKGROUND: RNA sequencing technique (RNA-seq) enables scientists to develop novel data-driven methods for discovering more unidentified lincRNAs. Meantime, knowledge-based technologies are experiencing a potential revolution ignited by the new deep learning methods. By scanning the newly found data set from RNA-seq, scientists have found that: (1) the expression of lincRNAs appears to be regulated, that is, the relevance exists along the DNA sequences; (2) lincRNAs contain some conversed patterns/motifs tethered together by non-conserved regions. The two evidences give the reasoning for adopting knowledge-based deep learning methods in lincRNA detection. Similar to coding region transcription, non-coding regions are split at transcriptional sites. However, regulatory RNAs rather than message RNAs are generated. That is, the transcribed RNAs participate the biological process as regulatory units instead of generating proteins. Identifying these transcriptional regions from non-coding regions is the first step towards lincRNA recognition. RESULTS: The auto-encoder method achieves 100% and 92.4% prediction accuracy on transcription sites over the putative data sets. The experimental results also show the excellent performance of predictive deep neural network on the lincRNA data sets compared with support vector machine and traditional neural network. In addition, it is validated through the newly discovered lincRNA data set and one unreported transcription site is found by feeding the whole annotated sequences through the deep learning machine, which indicates that deep learning method has the extensive ability for lincRNA prediction. CONCLUSIONS: The transcriptional sequences of lincRNAs are collected from the annotated human DNA genome data. Subsequently, a two-layer deep neural network is developed for the lincRNA detection, which adopts the auto-encoder algorithm and utilizes different encoding schemes to obtain the best performance over intergenic DNA sequence data. Driven by those newly annotated lincRNA data, deep learning methods based on auto-encoder algorithm can exert their capability in knowledge learning in order to capture the useful features and the information correlation along DNA genome sequences for lincRNA detection. As our knowledge, this is the first application to adopt the deep learning techniques for identifying lincRNA transcription sequences. BioMed Central 2017-12-06 /pmc/articles/PMC5731497/ /pubmed/29244011 http://dx.doi.org/10.1186/s12859-017-1922-3 Text en © The Author(s) 2017 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Yu, Ning Yu, Zeng Pan, Yi A deep learning method for lincRNA detection using auto-encoder algorithm |
title | A deep learning method for lincRNA detection using auto-encoder algorithm |
title_full | A deep learning method for lincRNA detection using auto-encoder algorithm |
title_fullStr | A deep learning method for lincRNA detection using auto-encoder algorithm |
title_full_unstemmed | A deep learning method for lincRNA detection using auto-encoder algorithm |
title_short | A deep learning method for lincRNA detection using auto-encoder algorithm |
title_sort | deep learning method for lincrna detection using auto-encoder algorithm |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5731497/ https://www.ncbi.nlm.nih.gov/pubmed/29244011 http://dx.doi.org/10.1186/s12859-017-1922-3 |
work_keys_str_mv | AT yuning adeeplearningmethodforlincrnadetectionusingautoencoderalgorithm AT yuzeng adeeplearningmethodforlincrnadetectionusingautoencoderalgorithm AT panyi adeeplearningmethodforlincrnadetectionusingautoencoderalgorithm AT yuning deeplearningmethodforlincrnadetectionusingautoencoderalgorithm AT yuzeng deeplearningmethodforlincrnadetectionusingautoencoderalgorithm AT panyi deeplearningmethodforlincrnadetectionusingautoencoderalgorithm |