Cargando…
Combining protein sequences and structures with transformers and equivariant graph neural networks to predict protein function
MOTIVATION: Millions of protein sequences have been generated by numerous genome and transcriptome sequencing projects. However, experimentally determining the function of the proteins is still a time consuming, low-throughput, and expensive process, leading to a large protein sequence-function gap....
Autores principales: | , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Cold Spring Harbor Laboratory
2023
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9882282/ https://www.ncbi.nlm.nih.gov/pubmed/36711471 http://dx.doi.org/10.1101/2023.01.17.524477 |
_version_ | 1784879268297179136 |
---|---|
author | Boadu, Frimpong Cao, Hongyuan Cheng, Jianlin |
author_facet | Boadu, Frimpong Cao, Hongyuan Cheng, Jianlin |
author_sort | Boadu, Frimpong |
collection | PubMed |
description | MOTIVATION: Millions of protein sequences have been generated by numerous genome and transcriptome sequencing projects. However, experimentally determining the function of the proteins is still a time consuming, low-throughput, and expensive process, leading to a large protein sequence-function gap. Therefore, it is important to develop computational methods to accurately predict protein function to fill the gap. Even though many methods have been developed to use protein sequences as input to predict function, much fewer methods leverage protein structures in protein function prediction because there was lack of accurate protein structures for most proteins until recently. RESULTS: We developed TransFun - a method using a transformer-based protein language model and 3D-equivariant graph neural networks to distill information from both protein sequences and structures to predict protein function. It extracts feature embeddings from protein sequences using a pre-trained protein language model (ESM) via transfer learning and combines them with 3D structures of proteins predicted by AlphaFold2 through equivariant graph neural networks. Benchmarked on the CAFA3 test dataset and a new test dataset, TransFun outperforms several state-of-the-art methods, indicating the language model and 3D-equivariant graph neural networks are effective methods to leverage protein sequences and structures to improve protein function prediction. Combining TransFun predictions and sequence similarity-based predictions can further increase prediction accuracy. AVAILABILITY: The source code of TransFun is available at https://github.com/jianlin-cheng/TransFun |
format | Online Article Text |
id | pubmed-9882282 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2023 |
publisher | Cold Spring Harbor Laboratory |
record_format | MEDLINE/PubMed |
spelling | pubmed-98822822023-01-28 Combining protein sequences and structures with transformers and equivariant graph neural networks to predict protein function Boadu, Frimpong Cao, Hongyuan Cheng, Jianlin bioRxiv Article MOTIVATION: Millions of protein sequences have been generated by numerous genome and transcriptome sequencing projects. However, experimentally determining the function of the proteins is still a time consuming, low-throughput, and expensive process, leading to a large protein sequence-function gap. Therefore, it is important to develop computational methods to accurately predict protein function to fill the gap. Even though many methods have been developed to use protein sequences as input to predict function, much fewer methods leverage protein structures in protein function prediction because there was lack of accurate protein structures for most proteins until recently. RESULTS: We developed TransFun - a method using a transformer-based protein language model and 3D-equivariant graph neural networks to distill information from both protein sequences and structures to predict protein function. It extracts feature embeddings from protein sequences using a pre-trained protein language model (ESM) via transfer learning and combines them with 3D structures of proteins predicted by AlphaFold2 through equivariant graph neural networks. Benchmarked on the CAFA3 test dataset and a new test dataset, TransFun outperforms several state-of-the-art methods, indicating the language model and 3D-equivariant graph neural networks are effective methods to leverage protein sequences and structures to improve protein function prediction. Combining TransFun predictions and sequence similarity-based predictions can further increase prediction accuracy. AVAILABILITY: The source code of TransFun is available at https://github.com/jianlin-cheng/TransFun Cold Spring Harbor Laboratory 2023-01-20 /pmc/articles/PMC9882282/ /pubmed/36711471 http://dx.doi.org/10.1101/2023.01.17.524477 Text en https://creativecommons.org/licenses/by-nd/4.0/This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License (https://creativecommons.org/licenses/by-nd/4.0/) , which allows reusers to copy and distribute the material in any medium or format in unadapted form only, and only so long as attribution is given to the creator. The license allows for commercial use. |
spellingShingle | Article Boadu, Frimpong Cao, Hongyuan Cheng, Jianlin Combining protein sequences and structures with transformers and equivariant graph neural networks to predict protein function |
title | Combining protein sequences and structures with transformers and equivariant graph neural networks to predict protein function |
title_full | Combining protein sequences and structures with transformers and equivariant graph neural networks to predict protein function |
title_fullStr | Combining protein sequences and structures with transformers and equivariant graph neural networks to predict protein function |
title_full_unstemmed | Combining protein sequences and structures with transformers and equivariant graph neural networks to predict protein function |
title_short | Combining protein sequences and structures with transformers and equivariant graph neural networks to predict protein function |
title_sort | combining protein sequences and structures with transformers and equivariant graph neural networks to predict protein function |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9882282/ https://www.ncbi.nlm.nih.gov/pubmed/36711471 http://dx.doi.org/10.1101/2023.01.17.524477 |
work_keys_str_mv | AT boadufrimpong combiningproteinsequencesandstructureswithtransformersandequivariantgraphneuralnetworkstopredictproteinfunction AT caohongyuan combiningproteinsequencesandstructureswithtransformersandequivariantgraphneuralnetworkstopredictproteinfunction AT chengjianlin combiningproteinsequencesandstructureswithtransformersandequivariantgraphneuralnetworkstopredictproteinfunction |