Cargando…

Inverse similarity and reliable negative samples for drug side-effect prediction

BACKGROUND: In silico prediction of potential drug side-effects is of crucial importance for drug development, since wet experimental identification of drug side-effects is expensive and time-consuming. Existing computational methods mainly focus on leveraging validated drug side-effect relations fo...

Descripción completa

Detalles Bibliográficos
Autores principales: Zheng, Yi, Peng, Hui, Ghosh, Shameek, Lan, Chaowang, Li, Jinyan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2019
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7402513/
https://www.ncbi.nlm.nih.gov/pubmed/30717666
http://dx.doi.org/10.1186/s12859-018-2563-x
_version_ 1783566772399505408
author Zheng, Yi
Peng, Hui
Ghosh, Shameek
Lan, Chaowang
Li, Jinyan
author_facet Zheng, Yi
Peng, Hui
Ghosh, Shameek
Lan, Chaowang
Li, Jinyan
author_sort Zheng, Yi
collection PubMed
description BACKGROUND: In silico prediction of potential drug side-effects is of crucial importance for drug development, since wet experimental identification of drug side-effects is expensive and time-consuming. Existing computational methods mainly focus on leveraging validated drug side-effect relations for the prediction. The performance is severely impeded by the lack of reliable negative training data. Thus, a method to select reliable negative samples becomes vital in the performance improvement. METHODS: Most of the existing computational prediction methods are essentially based on the assumption that similar drugs are inclined to share the same side-effects, which has given rise to remarkable performance. It is also rational to assume an inverse proposition that dissimilar drugs are less likely to share the same side-effects. Based on this inverse similarity hypothesis, we proposed a novel method to select highly-reliable negative samples for side-effect prediction. The first step of our method is to build a drug similarity integration framework to measure the similarity between drugs from different perspectives. This step integrates drug chemical structures, drug target proteins, drug substituents, and drug therapeutic information as features into a unified framework. Then, a similarity score between each candidate negative drug and validated positive drugs is calculated using the similarity integration framework. Those candidate negative drugs with lower similarity scores are preferentially selected as negative samples. Finally, both the validated positive drugs and the selected highly-reliable negative samples are used for predictions. RESULTS: The performance of the proposed method was evaluated on simulative side-effect prediction of 917 DrugBank drugs, comparing with four machine-learning algorithms. Extensive experiments show that the drug similarity integration framework has superior capability in capturing drug features, achieving much better performance than those based on a single type of drug property. Besides, the four machine-learning algorithms achieved significant improvement in macro-averaging F1-score (e.g., SVM from 0.655 to 0.898), macro-averaging precision (e.g., RBF from 0.592 to 0.828) and macro-averaging recall (e.g., KNN from 0.651 to 0.772) complimentarily attributed to the highly-reliable negative samples selected by the proposed method. CONCLUSIONS: The results suggest that the inverse similarity hypothesis and the integration of different drug properties are valuable for side-effect prediction. The selection of highly-reliable negative samples can also make significant contributions to the performance improvement. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2563-x) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-7402513
institution National Center for Biotechnology Information
language English
publishDate 2019
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-74025132020-08-07 Inverse similarity and reliable negative samples for drug side-effect prediction Zheng, Yi Peng, Hui Ghosh, Shameek Lan, Chaowang Li, Jinyan BMC Bioinformatics Research BACKGROUND: In silico prediction of potential drug side-effects is of crucial importance for drug development, since wet experimental identification of drug side-effects is expensive and time-consuming. Existing computational methods mainly focus on leveraging validated drug side-effect relations for the prediction. The performance is severely impeded by the lack of reliable negative training data. Thus, a method to select reliable negative samples becomes vital in the performance improvement. METHODS: Most of the existing computational prediction methods are essentially based on the assumption that similar drugs are inclined to share the same side-effects, which has given rise to remarkable performance. It is also rational to assume an inverse proposition that dissimilar drugs are less likely to share the same side-effects. Based on this inverse similarity hypothesis, we proposed a novel method to select highly-reliable negative samples for side-effect prediction. The first step of our method is to build a drug similarity integration framework to measure the similarity between drugs from different perspectives. This step integrates drug chemical structures, drug target proteins, drug substituents, and drug therapeutic information as features into a unified framework. Then, a similarity score between each candidate negative drug and validated positive drugs is calculated using the similarity integration framework. Those candidate negative drugs with lower similarity scores are preferentially selected as negative samples. Finally, both the validated positive drugs and the selected highly-reliable negative samples are used for predictions. RESULTS: The performance of the proposed method was evaluated on simulative side-effect prediction of 917 DrugBank drugs, comparing with four machine-learning algorithms. Extensive experiments show that the drug similarity integration framework has superior capability in capturing drug features, achieving much better performance than those based on a single type of drug property. Besides, the four machine-learning algorithms achieved significant improvement in macro-averaging F1-score (e.g., SVM from 0.655 to 0.898), macro-averaging precision (e.g., RBF from 0.592 to 0.828) and macro-averaging recall (e.g., KNN from 0.651 to 0.772) complimentarily attributed to the highly-reliable negative samples selected by the proposed method. CONCLUSIONS: The results suggest that the inverse similarity hypothesis and the integration of different drug properties are valuable for side-effect prediction. The selection of highly-reliable negative samples can also make significant contributions to the performance improvement. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2563-x) contains supplementary material, which is available to authorized users. BioMed Central 2019-02-04 /pmc/articles/PMC7402513/ /pubmed/30717666 http://dx.doi.org/10.1186/s12859-018-2563-x Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research
Zheng, Yi
Peng, Hui
Ghosh, Shameek
Lan, Chaowang
Li, Jinyan
Inverse similarity and reliable negative samples for drug side-effect prediction
title Inverse similarity and reliable negative samples for drug side-effect prediction
title_full Inverse similarity and reliable negative samples for drug side-effect prediction
title_fullStr Inverse similarity and reliable negative samples for drug side-effect prediction
title_full_unstemmed Inverse similarity and reliable negative samples for drug side-effect prediction
title_short Inverse similarity and reliable negative samples for drug side-effect prediction
title_sort inverse similarity and reliable negative samples for drug side-effect prediction
topic Research
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7402513/
https://www.ncbi.nlm.nih.gov/pubmed/30717666
http://dx.doi.org/10.1186/s12859-018-2563-x
work_keys_str_mv AT zhengyi inversesimilarityandreliablenegativesamplesfordrugsideeffectprediction
AT penghui inversesimilarityandreliablenegativesamplesfordrugsideeffectprediction
AT ghoshshameek inversesimilarityandreliablenegativesamplesfordrugsideeffectprediction
AT lanchaowang inversesimilarityandreliablenegativesamplesfordrugsideeffectprediction
AT lijinyan inversesimilarityandreliablenegativesamplesfordrugsideeffectprediction