Cargando…

NetQuilt: deep multispecies network-based protein function prediction using homology-informed network similarity

MOTIVATION: Transferring knowledge between species is challenging: different species contain distinct proteomes and cellular architectures, which cause their proteins to carry out different functions via different interaction networks. Many approaches to protein functional annotation use sequence si...

Descripción completa

Detalles Bibliográficos
Autores principales: Barot, Meet, Gligorijević, Vladimir, Cho, Kyunghyun, Bonneau, Richard
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8388039/
https://www.ncbi.nlm.nih.gov/pubmed/33576802
http://dx.doi.org/10.1093/bioinformatics/btab098
_version_ 1783742563503570944
author Barot, Meet
Gligorijević, Vladimir
Cho, Kyunghyun
Bonneau, Richard
author_facet Barot, Meet
Gligorijević, Vladimir
Cho, Kyunghyun
Bonneau, Richard
author_sort Barot, Meet
collection PubMed
description MOTIVATION: Transferring knowledge between species is challenging: different species contain distinct proteomes and cellular architectures, which cause their proteins to carry out different functions via different interaction networks. Many approaches to protein functional annotation use sequence similarity to transfer knowledge between species. These approaches cannot produce accurate predictions for proteins without homologues of known function, as many functions require cellular context for meaningful prediction. To supply this context, network-based methods use protein-protein interaction (PPI) networks as a source of information for inferring protein function and have demonstrated promising results in function prediction. However, most of these methods are tied to a network for a single species, and many species lack biological networks. RESULTS: In this work, we integrate sequence and network information across multiple species by computing IsoRank similarity scores to create a meta-network profile of the proteins of multiple species. We use this integrated multispecies meta-network as input to train a maxout neural network with Gene Ontology terms as target labels. Our multispecies approach takes advantage of more training examples, and consequently leads to significant improvements in function prediction performance compared to two network-based methods, a deep learning sequence-based method and the BLAST annotation method used in the Critial Assessment of Functional Annotation. We are able to demonstrate that our approach performs well even in cases where a species has no network information available: when an organism’s PPI network is left out we can use our multi-species method to make predictions for the left-out organism with good performance. AVAILABILITY AND IMPLEMENTATION: The code is freely available at https://github.com/nowittynamesleft/NetQuilt. The data, including sequences, PPI networks and GO annotations are available at https://string-db.org/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format Online
Article
Text
id pubmed-8388039
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-83880392021-08-26 NetQuilt: deep multispecies network-based protein function prediction using homology-informed network similarity Barot, Meet Gligorijević, Vladimir Cho, Kyunghyun Bonneau, Richard Bioinformatics Original Papers MOTIVATION: Transferring knowledge between species is challenging: different species contain distinct proteomes and cellular architectures, which cause their proteins to carry out different functions via different interaction networks. Many approaches to protein functional annotation use sequence similarity to transfer knowledge between species. These approaches cannot produce accurate predictions for proteins without homologues of known function, as many functions require cellular context for meaningful prediction. To supply this context, network-based methods use protein-protein interaction (PPI) networks as a source of information for inferring protein function and have demonstrated promising results in function prediction. However, most of these methods are tied to a network for a single species, and many species lack biological networks. RESULTS: In this work, we integrate sequence and network information across multiple species by computing IsoRank similarity scores to create a meta-network profile of the proteins of multiple species. We use this integrated multispecies meta-network as input to train a maxout neural network with Gene Ontology terms as target labels. Our multispecies approach takes advantage of more training examples, and consequently leads to significant improvements in function prediction performance compared to two network-based methods, a deep learning sequence-based method and the BLAST annotation method used in the Critial Assessment of Functional Annotation. We are able to demonstrate that our approach performs well even in cases where a species has no network information available: when an organism’s PPI network is left out we can use our multi-species method to make predictions for the left-out organism with good performance. AVAILABILITY AND IMPLEMENTATION: The code is freely available at https://github.com/nowittynamesleft/NetQuilt. The data, including sequences, PPI networks and GO annotations are available at https://string-db.org/. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2021-02-12 /pmc/articles/PMC8388039/ /pubmed/33576802 http://dx.doi.org/10.1093/bioinformatics/btab098 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Papers
Barot, Meet
Gligorijević, Vladimir
Cho, Kyunghyun
Bonneau, Richard
NetQuilt: deep multispecies network-based protein function prediction using homology-informed network similarity
title NetQuilt: deep multispecies network-based protein function prediction using homology-informed network similarity
title_full NetQuilt: deep multispecies network-based protein function prediction using homology-informed network similarity
title_fullStr NetQuilt: deep multispecies network-based protein function prediction using homology-informed network similarity
title_full_unstemmed NetQuilt: deep multispecies network-based protein function prediction using homology-informed network similarity
title_short NetQuilt: deep multispecies network-based protein function prediction using homology-informed network similarity
title_sort netquilt: deep multispecies network-based protein function prediction using homology-informed network similarity
topic Original Papers
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8388039/
https://www.ncbi.nlm.nih.gov/pubmed/33576802
http://dx.doi.org/10.1093/bioinformatics/btab098
work_keys_str_mv AT barotmeet netquiltdeepmultispeciesnetworkbasedproteinfunctionpredictionusinghomologyinformednetworksimilarity
AT gligorijevicvladimir netquiltdeepmultispeciesnetworkbasedproteinfunctionpredictionusinghomologyinformednetworksimilarity
AT chokyunghyun netquiltdeepmultispeciesnetworkbasedproteinfunctionpredictionusinghomologyinformednetworksimilarity
AT bonneaurichard netquiltdeepmultispeciesnetworkbasedproteinfunctionpredictionusinghomologyinformednetworksimilarity