Cargando…
Inverse similarity and reliable negative samples for drug side-effect prediction
BACKGROUND: In silico prediction of potential drug side-effects is of crucial importance for drug development, since wet experimental identification of drug side-effects is expensive and time-consuming. Existing computational methods mainly focus on leveraging validated drug side-effect relations fo...
Autores principales: | , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2019
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7402513/ https://www.ncbi.nlm.nih.gov/pubmed/30717666 http://dx.doi.org/10.1186/s12859-018-2563-x |
_version_ | 1783566772399505408 |
---|---|
author | Zheng, Yi Peng, Hui Ghosh, Shameek Lan, Chaowang Li, Jinyan |
author_facet | Zheng, Yi Peng, Hui Ghosh, Shameek Lan, Chaowang Li, Jinyan |
author_sort | Zheng, Yi |
collection | PubMed |
description | BACKGROUND: In silico prediction of potential drug side-effects is of crucial importance for drug development, since wet experimental identification of drug side-effects is expensive and time-consuming. Existing computational methods mainly focus on leveraging validated drug side-effect relations for the prediction. The performance is severely impeded by the lack of reliable negative training data. Thus, a method to select reliable negative samples becomes vital in the performance improvement. METHODS: Most of the existing computational prediction methods are essentially based on the assumption that similar drugs are inclined to share the same side-effects, which has given rise to remarkable performance. It is also rational to assume an inverse proposition that dissimilar drugs are less likely to share the same side-effects. Based on this inverse similarity hypothesis, we proposed a novel method to select highly-reliable negative samples for side-effect prediction. The first step of our method is to build a drug similarity integration framework to measure the similarity between drugs from different perspectives. This step integrates drug chemical structures, drug target proteins, drug substituents, and drug therapeutic information as features into a unified framework. Then, a similarity score between each candidate negative drug and validated positive drugs is calculated using the similarity integration framework. Those candidate negative drugs with lower similarity scores are preferentially selected as negative samples. Finally, both the validated positive drugs and the selected highly-reliable negative samples are used for predictions. RESULTS: The performance of the proposed method was evaluated on simulative side-effect prediction of 917 DrugBank drugs, comparing with four machine-learning algorithms. Extensive experiments show that the drug similarity integration framework has superior capability in capturing drug features, achieving much better performance than those based on a single type of drug property. Besides, the four machine-learning algorithms achieved significant improvement in macro-averaging F1-score (e.g., SVM from 0.655 to 0.898), macro-averaging precision (e.g., RBF from 0.592 to 0.828) and macro-averaging recall (e.g., KNN from 0.651 to 0.772) complimentarily attributed to the highly-reliable negative samples selected by the proposed method. CONCLUSIONS: The results suggest that the inverse similarity hypothesis and the integration of different drug properties are valuable for side-effect prediction. The selection of highly-reliable negative samples can also make significant contributions to the performance improvement. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2563-x) contains supplementary material, which is available to authorized users. |
format | Online Article Text |
id | pubmed-7402513 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2019 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-74025132020-08-07 Inverse similarity and reliable negative samples for drug side-effect prediction Zheng, Yi Peng, Hui Ghosh, Shameek Lan, Chaowang Li, Jinyan BMC Bioinformatics Research BACKGROUND: In silico prediction of potential drug side-effects is of crucial importance for drug development, since wet experimental identification of drug side-effects is expensive and time-consuming. Existing computational methods mainly focus on leveraging validated drug side-effect relations for the prediction. The performance is severely impeded by the lack of reliable negative training data. Thus, a method to select reliable negative samples becomes vital in the performance improvement. METHODS: Most of the existing computational prediction methods are essentially based on the assumption that similar drugs are inclined to share the same side-effects, which has given rise to remarkable performance. It is also rational to assume an inverse proposition that dissimilar drugs are less likely to share the same side-effects. Based on this inverse similarity hypothesis, we proposed a novel method to select highly-reliable negative samples for side-effect prediction. The first step of our method is to build a drug similarity integration framework to measure the similarity between drugs from different perspectives. This step integrates drug chemical structures, drug target proteins, drug substituents, and drug therapeutic information as features into a unified framework. Then, a similarity score between each candidate negative drug and validated positive drugs is calculated using the similarity integration framework. Those candidate negative drugs with lower similarity scores are preferentially selected as negative samples. Finally, both the validated positive drugs and the selected highly-reliable negative samples are used for predictions. RESULTS: The performance of the proposed method was evaluated on simulative side-effect prediction of 917 DrugBank drugs, comparing with four machine-learning algorithms. Extensive experiments show that the drug similarity integration framework has superior capability in capturing drug features, achieving much better performance than those based on a single type of drug property. Besides, the four machine-learning algorithms achieved significant improvement in macro-averaging F1-score (e.g., SVM from 0.655 to 0.898), macro-averaging precision (e.g., RBF from 0.592 to 0.828) and macro-averaging recall (e.g., KNN from 0.651 to 0.772) complimentarily attributed to the highly-reliable negative samples selected by the proposed method. CONCLUSIONS: The results suggest that the inverse similarity hypothesis and the integration of different drug properties are valuable for side-effect prediction. The selection of highly-reliable negative samples can also make significant contributions to the performance improvement. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1186/s12859-018-2563-x) contains supplementary material, which is available to authorized users. BioMed Central 2019-02-04 /pmc/articles/PMC7402513/ /pubmed/30717666 http://dx.doi.org/10.1186/s12859-018-2563-x Text en © The Author(s) 2019 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated. |
spellingShingle | Research Zheng, Yi Peng, Hui Ghosh, Shameek Lan, Chaowang Li, Jinyan Inverse similarity and reliable negative samples for drug side-effect prediction |
title | Inverse similarity and reliable negative samples for drug side-effect prediction |
title_full | Inverse similarity and reliable negative samples for drug side-effect prediction |
title_fullStr | Inverse similarity and reliable negative samples for drug side-effect prediction |
title_full_unstemmed | Inverse similarity and reliable negative samples for drug side-effect prediction |
title_short | Inverse similarity and reliable negative samples for drug side-effect prediction |
title_sort | inverse similarity and reliable negative samples for drug side-effect prediction |
topic | Research |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7402513/ https://www.ncbi.nlm.nih.gov/pubmed/30717666 http://dx.doi.org/10.1186/s12859-018-2563-x |
work_keys_str_mv | AT zhengyi inversesimilarityandreliablenegativesamplesfordrugsideeffectprediction AT penghui inversesimilarityandreliablenegativesamplesfordrugsideeffectprediction AT ghoshshameek inversesimilarityandreliablenegativesamplesfordrugsideeffectprediction AT lanchaowang inversesimilarityandreliablenegativesamplesfordrugsideeffectprediction AT lijinyan inversesimilarityandreliablenegativesamplesfordrugsideeffectprediction |