Cargando…

Combining protein sequences and structures with transformers and equivariant graph neural networks to predict protein function

MOTIVATION: Millions of protein sequences have been generated by numerous genome and transcriptome sequencing projects. However, experimentally determining the function of the proteins is still a time consuming, low-throughput, and expensive process, leading to a large protein sequence-function gap....

Descripción completa

Detalles Bibliográficos
Autores principales: Boadu, Frimpong, Cao, Hongyuan, Cheng, Jianlin
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Cold Spring Harbor Laboratory 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9882282/
https://www.ncbi.nlm.nih.gov/pubmed/36711471
http://dx.doi.org/10.1101/2023.01.17.524477
_version_ 1784879268297179136
author Boadu, Frimpong
Cao, Hongyuan
Cheng, Jianlin
author_facet Boadu, Frimpong
Cao, Hongyuan
Cheng, Jianlin
author_sort Boadu, Frimpong
collection PubMed
description MOTIVATION: Millions of protein sequences have been generated by numerous genome and transcriptome sequencing projects. However, experimentally determining the function of the proteins is still a time consuming, low-throughput, and expensive process, leading to a large protein sequence-function gap. Therefore, it is important to develop computational methods to accurately predict protein function to fill the gap. Even though many methods have been developed to use protein sequences as input to predict function, much fewer methods leverage protein structures in protein function prediction because there was lack of accurate protein structures for most proteins until recently. RESULTS: We developed TransFun - a method using a transformer-based protein language model and 3D-equivariant graph neural networks to distill information from both protein sequences and structures to predict protein function. It extracts feature embeddings from protein sequences using a pre-trained protein language model (ESM) via transfer learning and combines them with 3D structures of proteins predicted by AlphaFold2 through equivariant graph neural networks. Benchmarked on the CAFA3 test dataset and a new test dataset, TransFun outperforms several state-of-the-art methods, indicating the language model and 3D-equivariant graph neural networks are effective methods to leverage protein sequences and structures to improve protein function prediction. Combining TransFun predictions and sequence similarity-based predictions can further increase prediction accuracy. AVAILABILITY: The source code of TransFun is available at https://github.com/jianlin-cheng/TransFun
format Online
Article
Text
id pubmed-9882282
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Cold Spring Harbor Laboratory
record_format MEDLINE/PubMed
spelling pubmed-98822822023-01-28 Combining protein sequences and structures with transformers and equivariant graph neural networks to predict protein function Boadu, Frimpong Cao, Hongyuan Cheng, Jianlin bioRxiv Article MOTIVATION: Millions of protein sequences have been generated by numerous genome and transcriptome sequencing projects. However, experimentally determining the function of the proteins is still a time consuming, low-throughput, and expensive process, leading to a large protein sequence-function gap. Therefore, it is important to develop computational methods to accurately predict protein function to fill the gap. Even though many methods have been developed to use protein sequences as input to predict function, much fewer methods leverage protein structures in protein function prediction because there was lack of accurate protein structures for most proteins until recently. RESULTS: We developed TransFun - a method using a transformer-based protein language model and 3D-equivariant graph neural networks to distill information from both protein sequences and structures to predict protein function. It extracts feature embeddings from protein sequences using a pre-trained protein language model (ESM) via transfer learning and combines them with 3D structures of proteins predicted by AlphaFold2 through equivariant graph neural networks. Benchmarked on the CAFA3 test dataset and a new test dataset, TransFun outperforms several state-of-the-art methods, indicating the language model and 3D-equivariant graph neural networks are effective methods to leverage protein sequences and structures to improve protein function prediction. Combining TransFun predictions and sequence similarity-based predictions can further increase prediction accuracy. AVAILABILITY: The source code of TransFun is available at https://github.com/jianlin-cheng/TransFun Cold Spring Harbor Laboratory 2023-01-20 /pmc/articles/PMC9882282/ /pubmed/36711471 http://dx.doi.org/10.1101/2023.01.17.524477 Text en https://creativecommons.org/licenses/by-nd/4.0/This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nd/4.0/) , which allows reusers to copy and distribute the material in any medium or format in unadapted form only, and only so long as attribution is given to the creator. The license allows for commercial use.
spellingShingle Article
Boadu, Frimpong
Cao, Hongyuan
Cheng, Jianlin
Combining protein sequences and structures with transformers and equivariant graph neural networks to predict protein function
title Combining protein sequences and structures with transformers and equivariant graph neural networks to predict protein function
title_full Combining protein sequences and structures with transformers and equivariant graph neural networks to predict protein function
title_fullStr Combining protein sequences and structures with transformers and equivariant graph neural networks to predict protein function
title_full_unstemmed Combining protein sequences and structures with transformers and equivariant graph neural networks to predict protein function
title_short Combining protein sequences and structures with transformers and equivariant graph neural networks to predict protein function
title_sort combining protein sequences and structures with transformers and equivariant graph neural networks to predict protein function
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9882282/
https://www.ncbi.nlm.nih.gov/pubmed/36711471
http://dx.doi.org/10.1101/2023.01.17.524477
work_keys_str_mv AT boadufrimpong combiningproteinsequencesandstructureswithtransformersandequivariantgraphneuralnetworkstopredictproteinfunction
AT caohongyuan combiningproteinsequencesandstructureswithtransformersandequivariantgraphneuralnetworkstopredictproteinfunction
AT chengjianlin combiningproteinsequencesandstructureswithtransformersandequivariantgraphneuralnetworkstopredictproteinfunction