Cargando…

NetGO 2.0: improving large-scale protein function prediction with massive sequence, text, domain, family and network information

With the explosive growth of protein sequences, large-scale automated protein function prediction (AFP) is becoming challenging. A protein is usually associated with dozens of gene ontology (GO) terms. Therefore, AFP is regarded as a problem of large-scale multi-label classification. Under the learn...

Descripción completa

Detalles Bibliográficos
Autores principales: Yao, Shuwei, You, Ronghui, Wang, Shaojun, Xiong, Yi, Huang, Xiaodi, Zhu, Shanfeng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2021
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8262706/
https://www.ncbi.nlm.nih.gov/pubmed/34038555
http://dx.doi.org/10.1093/nar/gkab398
_version_ 1783719236416307200
author Yao, Shuwei
You, Ronghui
Wang, Shaojun
Xiong, Yi
Huang, Xiaodi
Zhu, Shanfeng
author_facet Yao, Shuwei
You, Ronghui
Wang, Shaojun
Xiong, Yi
Huang, Xiaodi
Zhu, Shanfeng
author_sort Yao, Shuwei
collection PubMed
description With the explosive growth of protein sequences, large-scale automated protein function prediction (AFP) is becoming challenging. A protein is usually associated with dozens of gene ontology (GO) terms. Therefore, AFP is regarded as a problem of large-scale multi-label classification. Under the learning to rank (LTR) framework, our previous NetGO tool integrated massive networks and multi-type information about protein sequences to achieve good performance by dealing with all possible GO terms (>44 000). In this work, we propose the updated version as NetGO 2.0, which further improves the performance of large-scale AFP. NetGO 2.0 also incorporates literature information by logistic regression and deep sequence information by recurrent neural network (RNN) into the framework. We generate datasets following the critical assessment of functional annotation (CAFA) protocol. Experiment results show that NetGO 2.0 outperformed NetGO significantly in biological process ontology (BPO) and cellular component ontology (CCO). In particular, NetGO 2.0 achieved a 12.6% improvement over NetGO in terms of area under precision-recall curve (AUPR) in BPO and around 2.6% in terms of [Formula: see text] in CCO. These results demonstrate the benefits of incorporating text and deep sequence information for the functional annotation of BPO and CCO. The NetGO 2.0 web server is freely available at http://issubmission.sjtu.edu.cn/ng2/.
format Online
Article
Text
id pubmed-8262706
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-82627062021-07-08 NetGO 2.0: improving large-scale protein function prediction with massive sequence, text, domain, family and network information Yao, Shuwei You, Ronghui Wang, Shaojun Xiong, Yi Huang, Xiaodi Zhu, Shanfeng Nucleic Acids Res Web Server Issue With the explosive growth of protein sequences, large-scale automated protein function prediction (AFP) is becoming challenging. A protein is usually associated with dozens of gene ontology (GO) terms. Therefore, AFP is regarded as a problem of large-scale multi-label classification. Under the learning to rank (LTR) framework, our previous NetGO tool integrated massive networks and multi-type information about protein sequences to achieve good performance by dealing with all possible GO terms (>44 000). In this work, we propose the updated version as NetGO 2.0, which further improves the performance of large-scale AFP. NetGO 2.0 also incorporates literature information by logistic regression and deep sequence information by recurrent neural network (RNN) into the framework. We generate datasets following the critical assessment of functional annotation (CAFA) protocol. Experiment results show that NetGO 2.0 outperformed NetGO significantly in biological process ontology (BPO) and cellular component ontology (CCO). In particular, NetGO 2.0 achieved a 12.6% improvement over NetGO in terms of area under precision-recall curve (AUPR) in BPO and around 2.6% in terms of [Formula: see text] in CCO. These results demonstrate the benefits of incorporating text and deep sequence information for the functional annotation of BPO and CCO. The NetGO 2.0 web server is freely available at http://issubmission.sjtu.edu.cn/ng2/. Oxford University Press 2021-05-26 /pmc/articles/PMC8262706/ /pubmed/34038555 http://dx.doi.org/10.1093/nar/gkab398 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) ), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Web Server Issue
Yao, Shuwei
You, Ronghui
Wang, Shaojun
Xiong, Yi
Huang, Xiaodi
Zhu, Shanfeng
NetGO 2.0: improving large-scale protein function prediction with massive sequence, text, domain, family and network information
title NetGO 2.0: improving large-scale protein function prediction with massive sequence, text, domain, family and network information
title_full NetGO 2.0: improving large-scale protein function prediction with massive sequence, text, domain, family and network information
title_fullStr NetGO 2.0: improving large-scale protein function prediction with massive sequence, text, domain, family and network information
title_full_unstemmed NetGO 2.0: improving large-scale protein function prediction with massive sequence, text, domain, family and network information
title_short NetGO 2.0: improving large-scale protein function prediction with massive sequence, text, domain, family and network information
title_sort netgo 2.0: improving large-scale protein function prediction with massive sequence, text, domain, family and network information
topic Web Server Issue
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8262706/
https://www.ncbi.nlm.nih.gov/pubmed/34038555
http://dx.doi.org/10.1093/nar/gkab398
work_keys_str_mv AT yaoshuwei netgo20improvinglargescaleproteinfunctionpredictionwithmassivesequencetextdomainfamilyandnetworkinformation
AT youronghui netgo20improvinglargescaleproteinfunctionpredictionwithmassivesequencetextdomainfamilyandnetworkinformation
AT wangshaojun netgo20improvinglargescaleproteinfunctionpredictionwithmassivesequencetextdomainfamilyandnetworkinformation
AT xiongyi netgo20improvinglargescaleproteinfunctionpredictionwithmassivesequencetextdomainfamilyandnetworkinformation
AT huangxiaodi netgo20improvinglargescaleproteinfunctionpredictionwithmassivesequencetextdomainfamilyandnetworkinformation
AT zhushanfeng netgo20improvinglargescaleproteinfunctionpredictionwithmassivesequencetextdomainfamilyandnetworkinformation