Cargando…
NetGO 2.0: improving large-scale protein function prediction with massive sequence, text, domain, family and network information
With the explosive growth of protein sequences, large-scale automated protein function prediction (AFP) is becoming challenging. A protein is usually associated with dozens of gene ontology (GO) terms. Therefore, AFP is regarded as a problem of large-scale multi-label classification. Under the learn...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8262706/ https://www.ncbi.nlm.nih.gov/pubmed/34038555 http://dx.doi.org/10.1093/nar/gkab398 |
_version_ | 1783719236416307200 |
---|---|
author | Yao, Shuwei You, Ronghui Wang, Shaojun Xiong, Yi Huang, Xiaodi Zhu, Shanfeng |
author_facet | Yao, Shuwei You, Ronghui Wang, Shaojun Xiong, Yi Huang, Xiaodi Zhu, Shanfeng |
author_sort | Yao, Shuwei |
collection | PubMed |
description | With the explosive growth of protein sequences, large-scale automated protein function prediction (AFP) is becoming challenging. A protein is usually associated with dozens of gene ontology (GO) terms. Therefore, AFP is regarded as a problem of large-scale multi-label classification. Under the learning to rank (LTR) framework, our previous NetGO tool integrated massive networks and multi-type information about protein sequences to achieve good performance by dealing with all possible GO terms (>44 000). In this work, we propose the updated version as NetGO 2.0, which further improves the performance of large-scale AFP. NetGO 2.0 also incorporates literature information by logistic regression and deep sequence information by recurrent neural network (RNN) into the framework. We generate datasets following the critical assessment of functional annotation (CAFA) protocol. Experiment results show that NetGO 2.0 outperformed NetGO significantly in biological process ontology (BPO) and cellular component ontology (CCO). In particular, NetGO 2.0 achieved a 12.6% improvement over NetGO in terms of area under precision-recall curve (AUPR) in BPO and around 2.6% in terms of [Formula: see text] in CCO. These results demonstrate the benefits of incorporating text and deep sequence information for the functional annotation of BPO and CCO. The NetGO 2.0 web server is freely available at http://issubmission.sjtu.edu.cn/ng2/. |
format | Online Article Text |
id | pubmed-8262706 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-82627062021-07-08 NetGO 2.0: improving large-scale protein function prediction with massive sequence, text, domain, family and network information Yao, Shuwei You, Ronghui Wang, Shaojun Xiong, Yi Huang, Xiaodi Zhu, Shanfeng Nucleic Acids Res Web Server Issue With the explosive growth of protein sequences, large-scale automated protein function prediction (AFP) is becoming challenging. A protein is usually associated with dozens of gene ontology (GO) terms. Therefore, AFP is regarded as a problem of large-scale multi-label classification. Under the learning to rank (LTR) framework, our previous NetGO tool integrated massive networks and multi-type information about protein sequences to achieve good performance by dealing with all possible GO terms (>44 000). In this work, we propose the updated version as NetGO 2.0, which further improves the performance of large-scale AFP. NetGO 2.0 also incorporates literature information by logistic regression and deep sequence information by recurrent neural network (RNN) into the framework. We generate datasets following the critical assessment of functional annotation (CAFA) protocol. Experiment results show that NetGO 2.0 outperformed NetGO significantly in biological process ontology (BPO) and cellular component ontology (CCO). In particular, NetGO 2.0 achieved a 12.6% improvement over NetGO in terms of area under precision-recall curve (AUPR) in BPO and around 2.6% in terms of [Formula: see text] in CCO. These results demonstrate the benefits of incorporating text and deep sequence information for the functional annotation of BPO and CCO. The NetGO 2.0 web server is freely available at http://issubmission.sjtu.edu.cn/ng2/. Oxford University Press 2021-05-26 /pmc/articles/PMC8262706/ /pubmed/34038555 http://dx.doi.org/10.1093/nar/gkab398 Text en © The Author(s) 2021. Published by Oxford University Press on behalf of Nucleic Acids Research. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (http://creativecommons.org/licenses/by-nc/4.0/ (https://creativecommons.org/licenses/by-nc/4.0/) ), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Web Server Issue Yao, Shuwei You, Ronghui Wang, Shaojun Xiong, Yi Huang, Xiaodi Zhu, Shanfeng NetGO 2.0: improving large-scale protein function prediction with massive sequence, text, domain, family and network information |
title | NetGO 2.0: improving large-scale protein function prediction with massive sequence, text, domain, family and network information |
title_full | NetGO 2.0: improving large-scale protein function prediction with massive sequence, text, domain, family and network information |
title_fullStr | NetGO 2.0: improving large-scale protein function prediction with massive sequence, text, domain, family and network information |
title_full_unstemmed | NetGO 2.0: improving large-scale protein function prediction with massive sequence, text, domain, family and network information |
title_short | NetGO 2.0: improving large-scale protein function prediction with massive sequence, text, domain, family and network information |
title_sort | netgo 2.0: improving large-scale protein function prediction with massive sequence, text, domain, family and network information |
topic | Web Server Issue |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8262706/ https://www.ncbi.nlm.nih.gov/pubmed/34038555 http://dx.doi.org/10.1093/nar/gkab398 |
work_keys_str_mv | AT yaoshuwei netgo20improvinglargescaleproteinfunctionpredictionwithmassivesequencetextdomainfamilyandnetworkinformation AT youronghui netgo20improvinglargescaleproteinfunctionpredictionwithmassivesequencetextdomainfamilyandnetworkinformation AT wangshaojun netgo20improvinglargescaleproteinfunctionpredictionwithmassivesequencetextdomainfamilyandnetworkinformation AT xiongyi netgo20improvinglargescaleproteinfunctionpredictionwithmassivesequencetextdomainfamilyandnetworkinformation AT huangxiaodi netgo20improvinglargescaleproteinfunctionpredictionwithmassivesequencetextdomainfamilyandnetworkinformation AT zhushanfeng netgo20improvinglargescaleproteinfunctionpredictionwithmassivesequencetextdomainfamilyandnetworkinformation |