Cargando…

Improving Compound–Protein Interaction Prediction by Self-Training with Augmenting Negative Samples

[Image: see text] Identifying compound-protein interactions (CPIs) is crucial for drug discovery. Since experimentally validating CPIs is often time-consuming and costly, computational approaches are expected to facilitate the process. Rapid growths of available CPI databases have accelerated the de...

Descripción completa

Detalles Bibliográficos
Autores principales: Koyama, Takuto, Matsumoto, Shigeyuki, Iwata, Hiroaki, Kojima, Ryosuke, Okuno, Yasushi
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2023
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10428206/
https://www.ncbi.nlm.nih.gov/pubmed/37460105
http://dx.doi.org/10.1021/acs.jcim.3c00269
_version_ 1785090411717459968
author Koyama, Takuto
Matsumoto, Shigeyuki
Iwata, Hiroaki
Kojima, Ryosuke
Okuno, Yasushi
author_facet Koyama, Takuto
Matsumoto, Shigeyuki
Iwata, Hiroaki
Kojima, Ryosuke
Okuno, Yasushi
author_sort Koyama, Takuto
collection PubMed
description [Image: see text] Identifying compound-protein interactions (CPIs) is crucial for drug discovery. Since experimentally validating CPIs is often time-consuming and costly, computational approaches are expected to facilitate the process. Rapid growths of available CPI databases have accelerated the development of many machine-learning methods for CPI predictions. However, their performance, particularly their generalizability against external data, often suffers from a data imbalance attributed to the lack of experimentally validated inactive (negative) samples. In this study, we developed a self-training method for augmenting both credible and informative negative samples to improve the performance of models impaired by data imbalances. The constructed model demonstrated higher performance than those constructed with other conventional methods for solving data imbalances, and the improvement was prominent for external datasets. Moreover, examination of the prediction score thresholds for pseudo-labeling during self-training revealed that augmenting the samples with ambiguous prediction scores is beneficial for constructing a model with high generalizability. The present study provides guidelines for improving CPI predictions on real-world data, thus facilitating drug discovery.
format Online
Article
Text
id pubmed-10428206
institution National Center for Biotechnology Information
language English
publishDate 2023
publisher American Chemical Society
record_format MEDLINE/PubMed
spelling pubmed-104282062023-08-17 Improving Compound–Protein Interaction Prediction by Self-Training with Augmenting Negative Samples Koyama, Takuto Matsumoto, Shigeyuki Iwata, Hiroaki Kojima, Ryosuke Okuno, Yasushi J Chem Inf Model [Image: see text] Identifying compound-protein interactions (CPIs) is crucial for drug discovery. Since experimentally validating CPIs is often time-consuming and costly, computational approaches are expected to facilitate the process. Rapid growths of available CPI databases have accelerated the development of many machine-learning methods for CPI predictions. However, their performance, particularly their generalizability against external data, often suffers from a data imbalance attributed to the lack of experimentally validated inactive (negative) samples. In this study, we developed a self-training method for augmenting both credible and informative negative samples to improve the performance of models impaired by data imbalances. The constructed model demonstrated higher performance than those constructed with other conventional methods for solving data imbalances, and the improvement was prominent for external datasets. Moreover, examination of the prediction score thresholds for pseudo-labeling during self-training revealed that augmenting the samples with ambiguous prediction scores is beneficial for constructing a model with high generalizability. The present study provides guidelines for improving CPI predictions on real-world data, thus facilitating drug discovery. American Chemical Society 2023-07-17 /pmc/articles/PMC10428206/ /pubmed/37460105 http://dx.doi.org/10.1021/acs.jcim.3c00269 Text en © 2023 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by-nc-nd/4.0/Permits non-commercial access and re-use, provided that author attribution and integrity are maintained; but does not permit creation of adaptations or other derivative works (https://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Koyama, Takuto
Matsumoto, Shigeyuki
Iwata, Hiroaki
Kojima, Ryosuke
Okuno, Yasushi
Improving Compound–Protein Interaction Prediction by Self-Training with Augmenting Negative Samples
title Improving Compound–Protein Interaction Prediction by Self-Training with Augmenting Negative Samples
title_full Improving Compound–Protein Interaction Prediction by Self-Training with Augmenting Negative Samples
title_fullStr Improving Compound–Protein Interaction Prediction by Self-Training with Augmenting Negative Samples
title_full_unstemmed Improving Compound–Protein Interaction Prediction by Self-Training with Augmenting Negative Samples
title_short Improving Compound–Protein Interaction Prediction by Self-Training with Augmenting Negative Samples
title_sort improving compound–protein interaction prediction by self-training with augmenting negative samples
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10428206/
https://www.ncbi.nlm.nih.gov/pubmed/37460105
http://dx.doi.org/10.1021/acs.jcim.3c00269
work_keys_str_mv AT koyamatakuto improvingcompoundproteininteractionpredictionbyselftrainingwithaugmentingnegativesamples
AT matsumotoshigeyuki improvingcompoundproteininteractionpredictionbyselftrainingwithaugmentingnegativesamples
AT iwatahiroaki improvingcompoundproteininteractionpredictionbyselftrainingwithaugmentingnegativesamples
AT kojimaryosuke improvingcompoundproteininteractionpredictionbyselftrainingwithaugmentingnegativesamples
AT okunoyasushi improvingcompoundproteininteractionpredictionbyselftrainingwithaugmentingnegativesamples