Cargando…

CSI: Contrastive data Stratification for Interaction prediction and its application to compound–protein interaction prediction

MOTIVATION: Accurately predicting the likelihood of interaction between two objects (compound–protein sequence, user–item, author–paper, etc.) is a fundamental problem in Computer Science. Current deep-learning models rely on learning accurate representations of the interacting objects. Importantly,...

Descripción completa

Detalles Bibliográficos
Autores principales: Kalia, Apurva, Krishnan, Dilip, Hassoun, Soha
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Oxford University Press 2023
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10423023/
https://www.ncbi.nlm.nih.gov/pubmed/37490457
http://dx.doi.org/10.1093/bioinformatics/btad456
_version_ 1785089356852101120
author Kalia, Apurva
Krishnan, Dilip
Hassoun, Soha
author_facet Kalia, Apurva
Krishnan, Dilip
Hassoun, Soha
author_sort Kalia, Apurva
collection PubMed
description MOTIVATION: Accurately predicting the likelihood of interaction between two objects (compound–protein sequence, user–item, author–paper, etc.) is a fundamental problem in Computer Science. Current deep-learning models rely on learning accurate representations of the interacting objects. Importantly, relationships between the interacting objects, or features of the interaction, offer an opportunity to partition the data to create multi-views of the interacting objects. The resulting congruent and non-congruent views can then be exploited via contrastive learning techniques to learn enhanced representations of the objects. RESULTS: We present a novel method, Contrastive Stratification for Interaction Prediction (CSI), to stratify (partition) a dataset in a manner that can be exploited via Contrastive Multiview Coding to learn embeddings that maximize the mutual information across congruent data views. CSI assigns a key and multiple views to each data point, where data partitions under a particular key form congruent views of the data. We showcase the effectiveness of CSI by applying it to the compound–protein sequence interaction prediction problem, a pressing problem whose solution promises to expedite drug delivery (drug–protein interaction prediction), metabolic engineering, and synthetic biology (compound–enzyme interaction prediction) applications. Comparing CSI with a baseline model that does not utilize data stratification and contrastive learning, and show gains in average precision ranging from 13.7% to 39% using compounds and sequences as keys across multiple drug–target and enzymatic datasets, and gains ranging from 16.9% to 63% using reaction features as keys across enzymatic datasets. AVAILABILITY AND IMPLEMENTATION: Code and dataset available at https://github.com/HassounLab/CSI.
format Online
Article
Text
id pubmed-10423023
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher Oxford University Press
record_format MEDLINE/PubMed
spelling pubmed-104230232023-08-13 CSI: Contrastive data Stratification for Interaction prediction and its application to compound–protein interaction prediction Kalia, Apurva Krishnan, Dilip Hassoun, Soha Bioinformatics Original Paper MOTIVATION: Accurately predicting the likelihood of interaction between two objects (compound–protein sequence, user–item, author–paper, etc.) is a fundamental problem in Computer Science. Current deep-learning models rely on learning accurate representations of the interacting objects. Importantly, relationships between the interacting objects, or features of the interaction, offer an opportunity to partition the data to create multi-views of the interacting objects. The resulting congruent and non-congruent views can then be exploited via contrastive learning techniques to learn enhanced representations of the objects. RESULTS: We present a novel method, Contrastive Stratification for Interaction Prediction (CSI), to stratify (partition) a dataset in a manner that can be exploited via Contrastive Multiview Coding to learn embeddings that maximize the mutual information across congruent data views. CSI assigns a key and multiple views to each data point, where data partitions under a particular key form congruent views of the data. We showcase the effectiveness of CSI by applying it to the compound–protein sequence interaction prediction problem, a pressing problem whose solution promises to expedite drug delivery (drug–protein interaction prediction), metabolic engineering, and synthetic biology (compound–enzyme interaction prediction) applications. Comparing CSI with a baseline model that does not utilize data stratification and contrastive learning, and show gains in average precision ranging from 13.7% to 39% using compounds and sequences as keys across multiple drug–target and enzymatic datasets, and gains ranging from 16.9% to 63% using reaction features as keys across enzymatic datasets. AVAILABILITY AND IMPLEMENTATION: Code and dataset available at https://github.com/HassounLab/CSI. Oxford University Press 2023-07-25 /pmc/articles/PMC10423023/ /pubmed/37490457 http://dx.doi.org/10.1093/bioinformatics/btad456 Text en © The Author(s) 2023. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Original Paper
Kalia, Apurva
Krishnan, Dilip
Hassoun, Soha
CSI: Contrastive data Stratification for Interaction prediction and its application to compound–protein interaction prediction
title CSI: Contrastive data Stratification for Interaction prediction and its application to compound–protein interaction prediction
title_full CSI: Contrastive data Stratification for Interaction prediction and its application to compound–protein interaction prediction
title_fullStr CSI: Contrastive data Stratification for Interaction prediction and its application to compound–protein interaction prediction
title_full_unstemmed CSI: Contrastive data Stratification for Interaction prediction and its application to compound–protein interaction prediction
title_short CSI: Contrastive data Stratification for Interaction prediction and its application to compound–protein interaction prediction
title_sort csi: contrastive data stratification for interaction prediction and its application to compound–protein interaction prediction
topic Original Paper
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10423023/
https://www.ncbi.nlm.nih.gov/pubmed/37490457
http://dx.doi.org/10.1093/bioinformatics/btad456
work_keys_str_mv AT kaliaapurva csicontrastivedatastratificationforinteractionpredictionanditsapplicationtocompoundproteininteractionprediction
AT krishnandilip csicontrastivedatastratificationforinteractionpredictionanditsapplicationtocompoundproteininteractionprediction
AT hassounsoha csicontrastivedatastratificationforinteractionpredictionanditsapplicationtocompoundproteininteractionprediction