Cargando…

A Multimodal Protein Representation Framework for Quantifying Transferability Across Biochemical Downstream Tasks

Proteins are the building blocks of life, carrying out fundamental functions in biology. In computational biology, an effective protein representation facilitates many important biological quantifications. Most existing protein representation methods are derived from self‐supervised language models...

Descripción completa

Detalles Bibliográficos
Autores principales: Hu, Fan, Hu, Yishen, Zhang, Weihong, Huang, Huazhen, Pan, Yi, Yin, Peng
Formato: Online Artículo Texto
Lenguaje:English
Publicado: John Wiley and Sons Inc. 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10401162/
https://www.ncbi.nlm.nih.gov/pubmed/37249398
http://dx.doi.org/10.1002/advs.202301223
_version_ 1785084596000391168
author Hu, Fan
Hu, Yishen
Zhang, Weihong
Huang, Huazhen
Pan, Yi
Yin, Peng
author_facet Hu, Fan
Hu, Yishen
Zhang, Weihong
Huang, Huazhen
Pan, Yi
Yin, Peng
author_sort Hu, Fan
collection PubMed
description Proteins are the building blocks of life, carrying out fundamental functions in biology. In computational biology, an effective protein representation facilitates many important biological quantifications. Most existing protein representation methods are derived from self‐supervised language models designed for text analysis. Proteins, however, are more than linear sequences of amino acids. Here, a multimodal deep learning framework for incorporating ≈1 million protein sequence, structure, and functional annotation (MASSA) is proposed. A multitask learning process with five specific pretraining objectives is presented to extract a fine‐grained protein‐domain feature. Through pretraining, multimodal protein representation achieves state‐of‐the‐art performance in specific downstream tasks such as protein properties (stability and fluorescence), protein‒protein interactions (shs27k/shs148k/string/skempi), and protein‒ligand interactions (kinase, DUD‐E), while achieving competitive results in secondary structure and remote homology tasks. Moreover, a novel optimal‐transport‐based metric with rich geometry awareness is introduced to quantify the dynamic transferability from the pretrained representation to the related downstream tasks, which provides a panoramic view of the step‐by‐step learning process. The pairwise distances between these downstream tasks are also calculated, and a strong correlation between the inter‐task feature space distributions and adaptability is observed.
format Online
Article
Text
id pubmed-10401162
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher John Wiley and Sons Inc.
record_format MEDLINE/PubMed
spelling pubmed-104011622023-08-05 A Multimodal Protein Representation Framework for Quantifying Transferability Across Biochemical Downstream Tasks Hu, Fan Hu, Yishen Zhang, Weihong Huang, Huazhen Pan, Yi Yin, Peng Adv Sci (Weinh) Research Articles Proteins are the building blocks of life, carrying out fundamental functions in biology. In computational biology, an effective protein representation facilitates many important biological quantifications. Most existing protein representation methods are derived from self‐supervised language models designed for text analysis. Proteins, however, are more than linear sequences of amino acids. Here, a multimodal deep learning framework for incorporating ≈1 million protein sequence, structure, and functional annotation (MASSA) is proposed. A multitask learning process with five specific pretraining objectives is presented to extract a fine‐grained protein‐domain feature. Through pretraining, multimodal protein representation achieves state‐of‐the‐art performance in specific downstream tasks such as protein properties (stability and fluorescence), protein‒protein interactions (shs27k/shs148k/string/skempi), and protein‒ligand interactions (kinase, DUD‐E), while achieving competitive results in secondary structure and remote homology tasks. Moreover, a novel optimal‐transport‐based metric with rich geometry awareness is introduced to quantify the dynamic transferability from the pretrained representation to the related downstream tasks, which provides a panoramic view of the step‐by‐step learning process. The pairwise distances between these downstream tasks are also calculated, and a strong correlation between the inter‐task feature space distributions and adaptability is observed. John Wiley and Sons Inc. 2023-05-30 /pmc/articles/PMC10401162/ /pubmed/37249398 http://dx.doi.org/10.1002/advs.202301223 Text en © 2023 The Authors. Advanced Science published by Wiley‐VCH GmbH https://creativecommons.org/licenses/by/4.0/This is an open access article under the terms of the http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Articles
Hu, Fan
Hu, Yishen
Zhang, Weihong
Huang, Huazhen
Pan, Yi
Yin, Peng
A Multimodal Protein Representation Framework for Quantifying Transferability Across Biochemical Downstream Tasks
title A Multimodal Protein Representation Framework for Quantifying Transferability Across Biochemical Downstream Tasks
title_full A Multimodal Protein Representation Framework for Quantifying Transferability Across Biochemical Downstream Tasks
title_fullStr A Multimodal Protein Representation Framework for Quantifying Transferability Across Biochemical Downstream Tasks
title_full_unstemmed A Multimodal Protein Representation Framework for Quantifying Transferability Across Biochemical Downstream Tasks
title_short A Multimodal Protein Representation Framework for Quantifying Transferability Across Biochemical Downstream Tasks
title_sort multimodal protein representation framework for quantifying transferability across biochemical downstream tasks
topic Research Articles
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10401162/
https://www.ncbi.nlm.nih.gov/pubmed/37249398
http://dx.doi.org/10.1002/advs.202301223
work_keys_str_mv AT hufan amultimodalproteinrepresentationframeworkforquantifyingtransferabilityacrossbiochemicaldownstreamtasks
AT huyishen amultimodalproteinrepresentationframeworkforquantifyingtransferabilityacrossbiochemicaldownstreamtasks
AT zhangweihong amultimodalproteinrepresentationframeworkforquantifyingtransferabilityacrossbiochemicaldownstreamtasks
AT huanghuazhen amultimodalproteinrepresentationframeworkforquantifyingtransferabilityacrossbiochemicaldownstreamtasks
AT panyi amultimodalproteinrepresentationframeworkforquantifyingtransferabilityacrossbiochemicaldownstreamtasks
AT yinpeng amultimodalproteinrepresentationframeworkforquantifyingtransferabilityacrossbiochemicaldownstreamtasks
AT hufan multimodalproteinrepresentationframeworkforquantifyingtransferabilityacrossbiochemicaldownstreamtasks
AT huyishen multimodalproteinrepresentationframeworkforquantifyingtransferabilityacrossbiochemicaldownstreamtasks
AT zhangweihong multimodalproteinrepresentationframeworkforquantifyingtransferabilityacrossbiochemicaldownstreamtasks
AT huanghuazhen multimodalproteinrepresentationframeworkforquantifyingtransferabilityacrossbiochemicaldownstreamtasks
AT panyi multimodalproteinrepresentationframeworkforquantifyingtransferabilityacrossbiochemicaldownstreamtasks
AT yinpeng multimodalproteinrepresentationframeworkforquantifyingtransferabilityacrossbiochemicaldownstreamtasks