Cargando…

DIFFUSE: predicting isoform functions from sequences and expression profiles via deep learning

MOTIVATION: Alternative splicing generates multiple isoforms from a single gene, greatly increasing the functional diversity of a genome. Although gene functions have been well studied, little is known about the specific functions of isoforms, making accurate prediction of isoform functions highly d...

Descripción completa

Detalles Bibliográficos
Autores principales: Chen, Hao, Shaw, Dipan, Zeng, Jianyang, Bu, Dongbo, Jiang, Tao
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6612874/
https://www.ncbi.nlm.nih.gov/pubmed/31510699
http://dx.doi.org/10.1093/bioinformatics/btz367
_version_ 1783432956606414848
author Chen, Hao
Shaw, Dipan
Zeng, Jianyang
Bu, Dongbo
Jiang, Tao
author_facet Chen, Hao
Shaw, Dipan
Zeng, Jianyang
Bu, Dongbo
Jiang, Tao
author_sort Chen, Hao
collection PubMed
description MOTIVATION: Alternative splicing generates multiple isoforms from a single gene, greatly increasing the functional diversity of a genome. Although gene functions have been well studied, little is known about the specific functions of isoforms, making accurate prediction of isoform functions highly desirable. However, the existing approaches to predicting isoform functions are far from satisfactory due to at least two reasons: (i) unlike genes, isoform-level functional annotations are scarce. (ii) The information of isoform functions is concealed in various types of data including isoform sequences, co-expression relationship among isoforms, etc. RESULTS: In this study, we present a novel approach, DIFFUSE (Deep learning-based prediction of IsoForm FUnctions from Sequences and Expression), to predict isoform functions. To integrate various types of data, our approach adopts a hybrid framework by first using a deep neural network (DNN) to predict the functions of isoforms from their genomic sequences and then refining the prediction using a conditional random field (CRF) based on co-expression relationship. To overcome the lack of isoform-level ground truth labels, we further propose an iterative semi-supervised learning algorithm to train both the DNN and CRF together. Our extensive computational experiments demonstrate that DIFFUSE could effectively predict the functions of isoforms and genes. It achieves an average area under the receiver operating characteristics curve of 0.840 and area under the precision–recall curve of 0.581 over 4184 GO functional categories, which are significantly higher than the state-of-the-art methods. We further validate the prediction results by analyzing the correlation between functional similarity, sequence similarity, expression similarity and structural similarity, as well as the consistency between the predicted functions and some well-studied functional features of isoform sequences. AVAILABILITY AND IMPLEMENTATION: https://github.com/haochenucr/DIFFUSE. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-6612874
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-66128742019-07-12 DIFFUSE: predicting isoform functions from sequences and expression profiles via deep learning Chen, Hao Shaw, Dipan Zeng, Jianyang Bu, Dongbo Jiang, Tao Bioinformatics Ismb/Eccb 2019 Conference Proceedings MOTIVATION: Alternative splicing generates multiple isoforms from a single gene, greatly increasing the functional diversity of a genome. Although gene functions have been well studied, little is known about the specific functions of isoforms, making accurate prediction of isoform functions highly desirable. However, the existing approaches to predicting isoform functions are far from satisfactory due to at least two reasons: (i) unlike genes, isoform-level functional annotations are scarce. (ii) The information of isoform functions is concealed in various types of data including isoform sequences, co-expression relationship among isoforms, etc. RESULTS: In this study, we present a novel approach, DIFFUSE (Deep learning-based prediction of IsoForm FUnctions from Sequences and Expression), to predict isoform functions. To integrate various types of data, our approach adopts a hybrid framework by first using a deep neural network (DNN) to predict the functions of isoforms from their genomic sequences and then refining the prediction using a conditional random field (CRF) based on co-expression relationship. To overcome the lack of isoform-level ground truth labels, we further propose an iterative semi-supervised learning algorithm to train both the DNN and CRF together. Our extensive computational experiments demonstrate that DIFFUSE could effectively predict the functions of isoforms and genes. It achieves an average area under the receiver operating characteristics curve of 0.840 and area under the precision–recall curve of 0.581 over 4184 GO functional categories, which are significantly higher than the state-of-the-art methods. We further validate the prediction results by analyzing the correlation between functional similarity, sequence similarity, expression similarity and structural similarity, as well as the consistency between the predicted functions and some well-studied functional features of isoform sequences. AVAILABILITY AND IMPLEMENTATION: https://github.com/haochenucr/DIFFUSE. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2019-07 2019-07-05 /pmc/articles/PMC6612874/ /pubmed/31510699 http://dx.doi.org/10.1093/bioinformatics/btz367 Text en © The Author(s) 2019. Published by Oxford University Press. http://creativecommons.org/licenses/by-nc/4.0/ This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Ismb/Eccb 2019 Conference Proceedings
Chen, Hao
Shaw, Dipan
Zeng, Jianyang
Bu, Dongbo
Jiang, Tao
DIFFUSE: predicting isoform functions from sequences and expression profiles via deep learning
title DIFFUSE: predicting isoform functions from sequences and expression profiles via deep learning
title_full DIFFUSE: predicting isoform functions from sequences and expression profiles via deep learning
title_fullStr DIFFUSE: predicting isoform functions from sequences and expression profiles via deep learning
title_full_unstemmed DIFFUSE: predicting isoform functions from sequences and expression profiles via deep learning
title_short DIFFUSE: predicting isoform functions from sequences and expression profiles via deep learning
title_sort diffuse: predicting isoform functions from sequences and expression profiles via deep learning
topic Ismb/Eccb 2019 Conference Proceedings
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6612874/
https://www.ncbi.nlm.nih.gov/pubmed/31510699
http://dx.doi.org/10.1093/bioinformatics/btz367
work_keys_str_mv AT chenhao diffusepredictingisoformfunctionsfromsequencesandexpressionprofilesviadeeplearning
AT shawdipan diffusepredictingisoformfunctionsfromsequencesandexpressionprofilesviadeeplearning
AT zengjianyang diffusepredictingisoformfunctionsfromsequencesandexpressionprofilesviadeeplearning
AT budongbo diffusepredictingisoformfunctionsfromsequencesandexpressionprofilesviadeeplearning
AT jiangtao diffusepredictingisoformfunctionsfromsequencesandexpressionprofilesviadeeplearning