Cargando…
DeepBound: accurate identification of transcript boundaries via deep convolutional neural fields
MOTIVATION: Reconstructing the full-length expressed transcripts (a.k.a. the transcript assembly problem) from the short sequencing reads produced by RNA-seq protocol plays a central role in identifying novel genes and transcripts as well as in studying gene expressions and gene functions. A crucial...
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5870651/ https://www.ncbi.nlm.nih.gov/pubmed/28881999 http://dx.doi.org/10.1093/bioinformatics/btx267 |
_version_ | 1783309526496182272 |
---|---|
author | Shao, Mingfu Ma, Jianzhu Wang, Sheng |
author_facet | Shao, Mingfu Ma, Jianzhu Wang, Sheng |
author_sort | Shao, Mingfu |
collection | PubMed |
description | MOTIVATION: Reconstructing the full-length expressed transcripts (a.k.a. the transcript assembly problem) from the short sequencing reads produced by RNA-seq protocol plays a central role in identifying novel genes and transcripts as well as in studying gene expressions and gene functions. A crucial step in transcript assembly is to accurately determine the splicing junctions and boundaries of the expressed transcripts from the reads alignment. In contrast to the splicing junctions that can be efficiently detected from spliced reads, the problem of identifying boundaries remains open and challenging, due to the fact that the signal related to boundaries is noisy and weak. RESULTS: We present DeepBound, an effective approach to identify boundaries of expressed transcripts from RNA-seq reads alignment. In its core DeepBound employs deep convolutional neural fields to learn the hidden distributions and patterns of boundaries. To accurately model the transition probabilities and to solve the label-imbalance problem, we novelly incorporate the AUC (area under the curve) score into the optimizing objective function. To address the issue that deep probabilistic graphical models requires large number of labeled training samples, we propose to use simulated RNA-seq datasets to train our model. Through extensive experimental studies on both simulation datasets of two species and biological datasets, we show that DeepBound consistently and significantly outperforms the two existing methods. AVAILABILITY AND IMPLEMENTATION: DeepBound is freely available at https://github.com/realbigws/DeepBound. |
format | Online Article Text |
id | pubmed-5870651 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-58706512018-04-05 DeepBound: accurate identification of transcript boundaries via deep convolutional neural fields Shao, Mingfu Ma, Jianzhu Wang, Sheng Bioinformatics Ismb/Eccb 2017: The 25th Annual Conference Intelligent Systems for Molecular Biology Held Jointly with the 16th Annual European Conference on Computational Biology, Prague, Czech Republic, July 21–25, 2017 MOTIVATION: Reconstructing the full-length expressed transcripts (a.k.a. the transcript assembly problem) from the short sequencing reads produced by RNA-seq protocol plays a central role in identifying novel genes and transcripts as well as in studying gene expressions and gene functions. A crucial step in transcript assembly is to accurately determine the splicing junctions and boundaries of the expressed transcripts from the reads alignment. In contrast to the splicing junctions that can be efficiently detected from spliced reads, the problem of identifying boundaries remains open and challenging, due to the fact that the signal related to boundaries is noisy and weak. RESULTS: We present DeepBound, an effective approach to identify boundaries of expressed transcripts from RNA-seq reads alignment. In its core DeepBound employs deep convolutional neural fields to learn the hidden distributions and patterns of boundaries. To accurately model the transition probabilities and to solve the label-imbalance problem, we novelly incorporate the AUC (area under the curve) score into the optimizing objective function. To address the issue that deep probabilistic graphical models requires large number of labeled training samples, we propose to use simulated RNA-seq datasets to train our model. Through extensive experimental studies on both simulation datasets of two species and biological datasets, we show that DeepBound consistently and significantly outperforms the two existing methods. AVAILABILITY AND IMPLEMENTATION: DeepBound is freely available at https://github.com/realbigws/DeepBound. Oxford University Press 2017-07-15 2017-07-12 /pmc/articles/PMC5870651/ /pubmed/28881999 http://dx.doi.org/10.1093/bioinformatics/btx267 Text en © The Author 2017. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Ismb/Eccb 2017: The 25th Annual Conference Intelligent Systems for Molecular Biology Held Jointly with the 16th Annual European Conference on Computational Biology, Prague, Czech Republic, July 21–25, 2017 Shao, Mingfu Ma, Jianzhu Wang, Sheng DeepBound: accurate identification of transcript boundaries via deep convolutional neural fields |
title | DeepBound: accurate identification of transcript boundaries via deep convolutional neural fields |
title_full | DeepBound: accurate identification of transcript boundaries via deep convolutional neural fields |
title_fullStr | DeepBound: accurate identification of transcript boundaries via deep convolutional neural fields |
title_full_unstemmed | DeepBound: accurate identification of transcript boundaries via deep convolutional neural fields |
title_short | DeepBound: accurate identification of transcript boundaries via deep convolutional neural fields |
title_sort | deepbound: accurate identification of transcript boundaries via deep convolutional neural fields |
topic | Ismb/Eccb 2017: The 25th Annual Conference Intelligent Systems for Molecular Biology Held Jointly with the 16th Annual European Conference on Computational Biology, Prague, Czech Republic, July 21–25, 2017 |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5870651/ https://www.ncbi.nlm.nih.gov/pubmed/28881999 http://dx.doi.org/10.1093/bioinformatics/btx267 |
work_keys_str_mv | AT shaomingfu deepboundaccurateidentificationoftranscriptboundariesviadeepconvolutionalneuralfields AT majianzhu deepboundaccurateidentificationoftranscriptboundariesviadeepconvolutionalneuralfields AT wangsheng deepboundaccurateidentificationoftranscriptboundariesviadeepconvolutionalneuralfields |