Cargando…

DeepBound: accurate identification of transcript boundaries via deep convolutional neural fields

MOTIVATION: Reconstructing the full-length expressed transcripts (a.k.a. the transcript assembly problem) from the short sequencing reads produced by RNA-seq protocol plays a central role in identifying novel genes and transcripts as well as in studying gene expressions and gene functions. A crucial...

Descripción completa

Detalles Bibliográficos
Autores principales: Shao, Mingfu, Ma, Jianzhu, Wang, Sheng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5870651/
https://www.ncbi.nlm.nih.gov/pubmed/28881999
http://dx.doi.org/10.1093/bioinformatics/btx267
_version_ 1783309526496182272
author Shao, Mingfu
Ma, Jianzhu
Wang, Sheng
author_facet Shao, Mingfu
Ma, Jianzhu
Wang, Sheng
author_sort Shao, Mingfu
collection PubMed
description MOTIVATION: Reconstructing the full-length expressed transcripts (a.k.a. the transcript assembly problem) from the short sequencing reads produced by RNA-seq protocol plays a central role in identifying novel genes and transcripts as well as in studying gene expressions and gene functions. A crucial step in transcript assembly is to accurately determine the splicing junctions and boundaries of the expressed transcripts from the reads alignment. In contrast to the splicing junctions that can be efficiently detected from spliced reads, the problem of identifying boundaries remains open and challenging, due to the fact that the signal related to boundaries is noisy and weak. RESULTS: We present DeepBound, an effective approach to identify boundaries of expressed transcripts from RNA-seq reads alignment. In its core DeepBound employs deep convolutional neural fields to learn the hidden distributions and patterns of boundaries. To accurately model the transition probabilities and to solve the label-imbalance problem, we novelly incorporate the AUC (area under the curve) score into the optimizing objective function. To address the issue that deep probabilistic graphical models requires large number of labeled training samples, we propose to use simulated RNA-seq datasets to train our model. Through extensive experimental studies on both simulation datasets of two species and biological datasets, we show that DeepBound consistently and significantly outperforms the two existing methods. AVAILABILITY AND IMPLEMENTATION: DeepBound is freely available at https://github.com/realbigws/DeepBound.
format Online
Article
Text
id pubmed-5870651
institution National Center for Biotechnology Information
language English
publishDate 2017
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-58706512018-04-05 DeepBound: accurate identification of transcript boundaries via deep convolutional neural fields Shao, Mingfu Ma, Jianzhu Wang, Sheng Bioinformatics Ismb/Eccb 2017: The 25th Annual Conference Intelligent Systems for Molecular Biology Held Jointly with the 16th Annual European Conference on Computational Biology, Prague, Czech Republic, July 21–25, 2017 MOTIVATION: Reconstructing the full-length expressed transcripts (a.k.a. the transcript assembly problem) from the short sequencing reads produced by RNA-seq protocol plays a central role in identifying novel genes and transcripts as well as in studying gene expressions and gene functions. A crucial step in transcript assembly is to accurately determine the splicing junctions and boundaries of the expressed transcripts from the reads alignment. In contrast to the splicing junctions that can be efficiently detected from spliced reads, the problem of identifying boundaries remains open and challenging, due to the fact that the signal related to boundaries is noisy and weak. RESULTS: We present DeepBound, an effective approach to identify boundaries of expressed transcripts from RNA-seq reads alignment. In its core DeepBound employs deep convolutional neural fields to learn the hidden distributions and patterns of boundaries. To accurately model the transition probabilities and to solve the label-imbalance problem, we novelly incorporate the AUC (area under the curve) score into the optimizing objective function. To address the issue that deep probabilistic graphical models requires large number of labeled training samples, we propose to use simulated RNA-seq datasets to train our model. Through extensive experimental studies on both simulation datasets of two species and biological datasets, we show that DeepBound consistently and significantly outperforms the two existing methods. AVAILABILITY AND IMPLEMENTATION: DeepBound is freely available at https://github.com/realbigws/DeepBound. Oxford University Press 2017-07-15 2017-07-12 /pmc/articles/PMC5870651/ /pubmed/28881999 http://dx.doi.org/10.1093/bioinformatics/btx267 Text en © The Author 2017. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Ismb/Eccb 2017: The 25th Annual Conference Intelligent Systems for Molecular Biology Held Jointly with the 16th Annual European Conference on Computational Biology, Prague, Czech Republic, July 21–25, 2017
Shao, Mingfu
Ma, Jianzhu
Wang, Sheng
DeepBound: accurate identification of transcript boundaries via deep convolutional neural fields
title DeepBound: accurate identification of transcript boundaries via deep convolutional neural fields
title_full DeepBound: accurate identification of transcript boundaries via deep convolutional neural fields
title_fullStr DeepBound: accurate identification of transcript boundaries via deep convolutional neural fields
title_full_unstemmed DeepBound: accurate identification of transcript boundaries via deep convolutional neural fields
title_short DeepBound: accurate identification of transcript boundaries via deep convolutional neural fields
title_sort deepbound: accurate identification of transcript boundaries via deep convolutional neural fields
topic Ismb/Eccb 2017: The 25th Annual Conference Intelligent Systems for Molecular Biology Held Jointly with the 16th Annual European Conference on Computational Biology, Prague, Czech Republic, July 21–25, 2017
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5870651/
https://www.ncbi.nlm.nih.gov/pubmed/28881999
http://dx.doi.org/10.1093/bioinformatics/btx267
work_keys_str_mv AT shaomingfu deepboundaccurateidentificationoftranscriptboundariesviadeepconvolutionalneuralfields
AT majianzhu deepboundaccurateidentificationoftranscriptboundariesviadeepconvolutionalneuralfields
AT wangsheng deepboundaccurateidentificationoftranscriptboundariesviadeepconvolutionalneuralfields