Cargando…

Predicting human protein function with multi-task deep neural networks

Machine learning methods for protein function prediction are urgently needed, especially now that a substantial fraction of known sequences remains unannotated despite the extensive use of functional assignments based on sequence similarity. One major bottleneck supervised learning faces in protein...

Descripción completa

Detalles Bibliográficos
Autores principales:	Fa, Rui, Cozzetto, Domenico, Wan, Cen, Jones, David T.
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Public Library of Science 2018
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5995439/ https://www.ncbi.nlm.nih.gov/pubmed/29889900 http://dx.doi.org/10.1371/journal.pone.0198216

_version_	1783330623848448000
author	Fa, Rui Cozzetto, Domenico Wan, Cen Jones, David T.
author_facet	Fa, Rui Cozzetto, Domenico Wan, Cen Jones, David T.
author_sort	Fa, Rui
collection	PubMed
description	Machine learning methods for protein function prediction are urgently needed, especially now that a substantial fraction of known sequences remains unannotated despite the extensive use of functional assignments based on sequence similarity. One major bottleneck supervised learning faces in protein function prediction is the structured, multi-label nature of the problem, because biological roles are represented by lists of terms from hierarchically organised controlled vocabularies such as the Gene Ontology. In this work, we build on recent developments in the area of deep learning and investigate the usefulness of multi-task deep neural networks (MTDNN), which consist of upstream shared layers upon which are stacked in parallel as many independent modules (additional hidden layers with their own output units) as the number of output GO terms (the tasks). MTDNN learns individual tasks partially using shared representations and partially from task-specific characteristics. When no close homologues with experimentally validated functions can be identified, MTDNN gives more accurate predictions than baseline methods based on annotation frequencies in public databases or homology transfers. More importantly, the results show that MTDNN binary classification accuracy is higher than alternative machine learning-based methods that do not exploit commonalities and differences among prediction tasks. Interestingly, compared with a single-task predictor, the performance improvement is not linearly correlated with the number of tasks in MTDNN, but medium size models provide more improvement in our case. One of advantages of MTDNN is that given a set of features, there is no requirement for MTDNN to have a bootstrap feature selection procedure as what traditional machine learning algorithms do. Overall, the results indicate that the proposed MTDNN algorithm improves the performance of protein function prediction. On the other hand, there is still large room for deep learning techniques to further enhance prediction ability.
format	Online Article Text
id	pubmed-5995439
institution	National Center for Biotechnology Information
language	English
publishDate	2018
publisher	Public Library of Science
record_format	MEDLINE/PubMed
spelling	pubmed-59954392018-06-21 Predicting human protein function with multi-task deep neural networks Fa, Rui Cozzetto, Domenico Wan, Cen Jones, David T. PLoS One Research Article Machine learning methods for protein function prediction are urgently needed, especially now that a substantial fraction of known sequences remains unannotated despite the extensive use of functional assignments based on sequence similarity. One major bottleneck supervised learning faces in protein function prediction is the structured, multi-label nature of the problem, because biological roles are represented by lists of terms from hierarchically organised controlled vocabularies such as the Gene Ontology. In this work, we build on recent developments in the area of deep learning and investigate the usefulness of multi-task deep neural networks (MTDNN), which consist of upstream shared layers upon which are stacked in parallel as many independent modules (additional hidden layers with their own output units) as the number of output GO terms (the tasks). MTDNN learns individual tasks partially using shared representations and partially from task-specific characteristics. When no close homologues with experimentally validated functions can be identified, MTDNN gives more accurate predictions than baseline methods based on annotation frequencies in public databases or homology transfers. More importantly, the results show that MTDNN binary classification accuracy is higher than alternative machine learning-based methods that do not exploit commonalities and differences among prediction tasks. Interestingly, compared with a single-task predictor, the performance improvement is not linearly correlated with the number of tasks in MTDNN, but medium size models provide more improvement in our case. One of advantages of MTDNN is that given a set of features, there is no requirement for MTDNN to have a bootstrap feature selection procedure as what traditional machine learning algorithms do. Overall, the results indicate that the proposed MTDNN algorithm improves the performance of protein function prediction. On the other hand, there is still large room for deep learning techniques to further enhance prediction ability. Public Library of Science 2018-06-11 /pmc/articles/PMC5995439/ /pubmed/29889900 http://dx.doi.org/10.1371/journal.pone.0198216 Text en © 2018 Fa et al http://creativecommons.org/licenses/by/4.0/ This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/) , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
spellingShingle	Research Article Fa, Rui Cozzetto, Domenico Wan, Cen Jones, David T. Predicting human protein function with multi-task deep neural networks
title	Predicting human protein function with multi-task deep neural networks
title_full	Predicting human protein function with multi-task deep neural networks
title_fullStr	Predicting human protein function with multi-task deep neural networks
title_full_unstemmed	Predicting human protein function with multi-task deep neural networks
title_short	Predicting human protein function with multi-task deep neural networks
title_sort	predicting human protein function with multi-task deep neural networks
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5995439/ https://www.ncbi.nlm.nih.gov/pubmed/29889900 http://dx.doi.org/10.1371/journal.pone.0198216
work_keys_str_mv	AT farui predictinghumanproteinfunctionwithmultitaskdeepneuralnetworks AT cozzettodomenico predictinghumanproteinfunctionwithmultitaskdeepneuralnetworks AT wancen predictinghumanproteinfunctionwithmultitaskdeepneuralnetworks AT jonesdavidt predictinghumanproteinfunctionwithmultitaskdeepneuralnetworks

Predicting human protein function with multi-task deep neural networks

Ejemplares similares