Cargando…

PANDA2: protein function prediction using graph neural networks

High-throughput sequencing technologies have generated massive protein sequences, but the annotations of protein sequences highly rely on the low-throughput and expensive biological experiments. Therefore, accurate and fast computational alternatives are needed to infer functional knowledge from pro...

Descripción completa

Detalles Bibliográficos
Autores principales: Zhao, Chenguang, Liu, Tong, Wang, Zheng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2022
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8808544/
https://www.ncbi.nlm.nih.gov/pubmed/35118378
http://dx.doi.org/10.1093/nargab/lqac004
_version_ 1784643899184119808
author Zhao, Chenguang
Liu, Tong
Wang, Zheng
author_facet Zhao, Chenguang
Liu, Tong
Wang, Zheng
author_sort Zhao, Chenguang
collection PubMed
description High-throughput sequencing technologies have generated massive protein sequences, but the annotations of protein sequences highly rely on the low-throughput and expensive biological experiments. Therefore, accurate and fast computational alternatives are needed to infer functional knowledge from protein sequences. The gene ontology (GO) directed acyclic graph (DAG) contains the hierarchical relationships between GO terms but is hard to be integrated into machine learning algorithms for functional predictions. We developed a deep learning system named PANDA2 to predict protein functions, which used the cutting-edge graph neural network to model the topology of the GO DAG and integrated the features generated by transformer protein language models. Compared with the top 10 methods in CAFA3, PANDA2 ranked first in cellular component ontology (CCO), tied first in biological process ontology (BPO) but had a higher coverage rate, and second in molecular function ontology (MFO). Compared with other recently-developed cutting-edge predictors DeepGOPlus, GOLabeler, and DeepText2GO, and benchmarked on another independent dataset, PANDA2 ranked first in CCO, first in BPO, and second in MFO. PANDA2 can be freely accessed from http://dna.cs.miami.edu/PANDA2/.
format Online
Article
Text
id pubmed-8808544
institution National Center for Biotechnology Information
language English
publishDate 2022
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-88085442022-02-02 PANDA2: protein function prediction using graph neural networks Zhao, Chenguang Liu, Tong Wang, Zheng NAR Genom Bioinform Standard Article High-throughput sequencing technologies have generated massive protein sequences, but the annotations of protein sequences highly rely on the low-throughput and expensive biological experiments. Therefore, accurate and fast computational alternatives are needed to infer functional knowledge from protein sequences. The gene ontology (GO) directed acyclic graph (DAG) contains the hierarchical relationships between GO terms but is hard to be integrated into machine learning algorithms for functional predictions. We developed a deep learning system named PANDA2 to predict protein functions, which used the cutting-edge graph neural network to model the topology of the GO DAG and integrated the features generated by transformer protein language models. Compared with the top 10 methods in CAFA3, PANDA2 ranked first in cellular component ontology (CCO), tied first in biological process ontology (BPO) but had a higher coverage rate, and second in molecular function ontology (MFO). Compared with other recently-developed cutting-edge predictors DeepGOPlus, GOLabeler, and DeepText2GO, and benchmarked on another independent dataset, PANDA2 ranked first in CCO, first in BPO, and second in MFO. PANDA2 can be freely accessed from http://dna.cs.miami.edu/PANDA2/. Oxford University Press 2022-02-02 /pmc/articles/PMC8808544/ /pubmed/35118378 http://dx.doi.org/10.1093/nargab/lqac004 Text en © The Author(s) 2022. Published by Oxford University Press on behalf of NAR Genomics and Bioinformatics. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution-NonCommercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com
spellingShingle Standard Article
Zhao, Chenguang
Liu, Tong
Wang, Zheng
PANDA2: protein function prediction using graph neural networks
title PANDA2: protein function prediction using graph neural networks
title_full PANDA2: protein function prediction using graph neural networks
title_fullStr PANDA2: protein function prediction using graph neural networks
title_full_unstemmed PANDA2: protein function prediction using graph neural networks
title_short PANDA2: protein function prediction using graph neural networks
title_sort panda2: protein function prediction using graph neural networks
topic Standard Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8808544/
https://www.ncbi.nlm.nih.gov/pubmed/35118378
http://dx.doi.org/10.1093/nargab/lqac004
work_keys_str_mv AT zhaochenguang panda2proteinfunctionpredictionusinggraphneuralnetworks
AT liutong panda2proteinfunctionpredictionusinggraphneuralnetworks
AT wangzheng panda2proteinfunctionpredictionusinggraphneuralnetworks