Cargando…
DIFFUSE: predicting isoform functions from sequences and expression profiles via deep learning
MOTIVATION: Alternative splicing generates multiple isoforms from a single gene, greatly increasing the functional diversity of a genome. Although gene functions have been well studied, little is known about the specific functions of isoforms, making accurate prediction of isoform functions highly d...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6612874/ https://www.ncbi.nlm.nih.gov/pubmed/31510699 http://dx.doi.org/10.1093/bioinformatics/btz367 |
_version_ | 1783432956606414848 |
---|---|
author | Chen, Hao Shaw, Dipan Zeng, Jianyang Bu, Dongbo Jiang, Tao |
author_facet | Chen, Hao Shaw, Dipan Zeng, Jianyang Bu, Dongbo Jiang, Tao |
author_sort | Chen, Hao |
collection | PubMed |
description | MOTIVATION: Alternative splicing generates multiple isoforms from a single gene, greatly increasing the functional diversity of a genome. Although gene functions have been well studied, little is known about the specific functions of isoforms, making accurate prediction of isoform functions highly desirable. However, the existing approaches to predicting isoform functions are far from satisfactory due to at least two reasons: (i) unlike genes, isoform-level functional annotations are scarce. (ii) The information of isoform functions is concealed in various types of data including isoform sequences, co-expression relationship among isoforms, etc. RESULTS: In this study, we present a novel approach, DIFFUSE (Deep learning-based prediction of IsoForm FUnctions from Sequences and Expression), to predict isoform functions. To integrate various types of data, our approach adopts a hybrid framework by first using a deep neural network (DNN) to predict the functions of isoforms from their genomic sequences and then refining the prediction using a conditional random field (CRF) based on co-expression relationship. To overcome the lack of isoform-level ground truth labels, we further propose an iterative semi-supervised learning algorithm to train both the DNN and CRF together. Our extensive computational experiments demonstrate that DIFFUSE could effectively predict the functions of isoforms and genes. It achieves an average area under the receiver operating characteristics curve of 0.840 and area under the precision–recall curve of 0.581 over 4184 GO functional categories, which are significantly higher than the state-of-the-art methods. We further validate the prediction results by analyzing the correlation between functional similarity, sequence similarity, expression similarity and structural similarity, as well as the consistency between the predicted functions and some well-studied functional features of isoform sequences. AVAILABILITY AND IMPLEMENTATION: https://github.com/haochenucr/DIFFUSE. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-6612874 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-66128742019-07-12 DIFFUSE: predicting isoform functions from sequences and expression profiles via deep learning Chen, Hao Shaw, Dipan Zeng, Jianyang Bu, Dongbo Jiang, Tao Bioinformatics Ismb/Eccb 2019 Conference Proceedings MOTIVATION: Alternative splicing generates multiple isoforms from a single gene, greatly increasing the functional diversity of a genome. Although gene functions have been well studied, little is known about the specific functions of isoforms, making accurate prediction of isoform functions highly desirable. However, the existing approaches to predicting isoform functions are far from satisfactory due to at least two reasons: (i) unlike genes, isoform-level functional annotations are scarce. (ii) The information of isoform functions is concealed in various types of data including isoform sequences, co-expression relationship among isoforms, etc. RESULTS: In this study, we present a novel approach, DIFFUSE (Deep learning-based prediction of IsoForm FUnctions from Sequences and Expression), to predict isoform functions. To integrate various types of data, our approach adopts a hybrid framework by first using a deep neural network (DNN) to predict the functions of isoforms from their genomic sequences and then refining the prediction using a conditional random field (CRF) based on co-expression relationship. To overcome the lack of isoform-level ground truth labels, we further propose an iterative semi-supervised learning algorithm to train both the DNN and CRF together. Our extensive computational experiments demonstrate that DIFFUSE could effectively predict the functions of isoforms and genes. It achieves an average area under the receiver operating characteristics curve of 0.840 and area under the precision–recall curve of 0.581 over 4184 GO functional categories, which are significantly higher than the state-of-the-art methods. We further validate the prediction results by analyzing the correlation between functional similarity, sequence similarity, expression similarity and structural similarity, as well as the consistency between the predicted functions and some well-studied functional features of isoform sequences. AVAILABILITY AND IMPLEMENTATION: https://github.com/haochenucr/DIFFUSE. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2019-07 2019-07-05 /pmc/articles/PMC6612874/ /pubmed/31510699 http://dx.doi.org/10.1093/bioinformatics/btz367 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Ismb/Eccb 2019 Conference Proceedings Chen, Hao Shaw, Dipan Zeng, Jianyang Bu, Dongbo Jiang, Tao DIFFUSE: predicting isoform functions from sequences and expression profiles via deep learning |
title | DIFFUSE: predicting isoform functions from sequences and expression profiles via deep learning |
title_full | DIFFUSE: predicting isoform functions from sequences and expression profiles via deep learning |
title_fullStr | DIFFUSE: predicting isoform functions from sequences and expression profiles via deep learning |
title_full_unstemmed | DIFFUSE: predicting isoform functions from sequences and expression profiles via deep learning |
title_short | DIFFUSE: predicting isoform functions from sequences and expression profiles via deep learning |
title_sort | diffuse: predicting isoform functions from sequences and expression profiles via deep learning |
topic | Ismb/Eccb 2019 Conference Proceedings |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6612874/ https://www.ncbi.nlm.nih.gov/pubmed/31510699 http://dx.doi.org/10.1093/bioinformatics/btz367 |
work_keys_str_mv | AT chenhao diffusepredictingisoformfunctionsfromsequencesandexpressionprofilesviadeeplearning AT shawdipan diffusepredictingisoformfunctionsfromsequencesandexpressionprofilesviadeeplearning AT zengjianyang diffusepredictingisoformfunctionsfromsequencesandexpressionprofilesviadeeplearning AT budongbo diffusepredictingisoformfunctionsfromsequencesandexpressionprofilesviadeeplearning AT jiangtao diffusepredictingisoformfunctionsfromsequencesandexpressionprofilesviadeeplearning |