Cargando…

A comparative study of SMILES-based compound similarity functions for drug-target interaction prediction

BACKGROUND: Molecular structures can be represented as strings of special characters using SMILES. Since each molecule is represented as a string, the similarity between compounds can be computed using SMILES-based string similarity functions. Most previous studies on drug-target interaction predict...

Descripción completa

Detalles Bibliográficos
Autores principales: Öztürk, Hakime, Ozkirimli, Elif, Özgür, Arzucan
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2016
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4797122/
https://www.ncbi.nlm.nih.gov/pubmed/26987649
http://dx.doi.org/10.1186/s12859-016-0977-x
_version_ 1782421891518038016
author Öztürk, Hakime
Ozkirimli, Elif
Özgür, Arzucan
author_facet Öztürk, Hakime
Ozkirimli, Elif
Özgür, Arzucan
author_sort Öztürk, Hakime
collection PubMed
description BACKGROUND: Molecular structures can be represented as strings of special characters using SMILES. Since each molecule is represented as a string, the similarity between compounds can be computed using SMILES-based string similarity functions. Most previous studies on drug-target interaction prediction use 2D-based compound similarity kernels such as SIMCOMP. To the best of our knowledge, using SMILES-based similarity functions, which are computationally more efficient than the 2D-based kernels, has not been investigated for this task before. RESULTS: In this study, we adapt and evaluate various SMILES-based similarity methods for drug-target interaction prediction. In addition, inspired by the vector space model of Information Retrieval we propose cosine similarity based SMILES kernels that make use of the Term Frequency (TF) and Term Frequency-Inverse Document Frequency (TF-IDF) weighting approaches. We also investigate generating composite kernels by combining our best SMILES-based similarity functions with the SIMCOMP kernel. With this study, we provided a comparison of 13 different ligand similarity functions, each of which utilizes the SMILES string of molecule representation. Additionally, TF and TF-IDF based cosine similarity kernels are proposed. CONCLUSION: The more efficient SMILES-based similarity functions performed similarly to the more complex 2D-based SIMCOMP kernel in terms of AUC-ROC scores. The TF-IDF based cosine similarity obtained a better AUC-PR score than the SIMCOMP kernel on the GPCR benchmark data set. The composite kernel of TF-IDF based cosine similarity and SIMCOMP achieved the best AUC-PR scores for all data sets. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-0977-x) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-4797122
institution National Center for Biotechnology Information
language English
publishDate 2016
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-47971222016-03-18 A comparative study of SMILES-based compound similarity functions for drug-target interaction prediction Öztürk, Hakime Ozkirimli, Elif Özgür, Arzucan BMC Bioinformatics Research Article BACKGROUND: Molecular structures can be represented as strings of special characters using SMILES. Since each molecule is represented as a string, the similarity between compounds can be computed using SMILES-based string similarity functions. Most previous studies on drug-target interaction prediction use 2D-based compound similarity kernels such as SIMCOMP. To the best of our knowledge, using SMILES-based similarity functions, which are computationally more efficient than the 2D-based kernels, has not been investigated for this task before. RESULTS: In this study, we adapt and evaluate various SMILES-based similarity methods for drug-target interaction prediction. In addition, inspired by the vector space model of Information Retrieval we propose cosine similarity based SMILES kernels that make use of the Term Frequency (TF) and Term Frequency-Inverse Document Frequency (TF-IDF) weighting approaches. We also investigate generating composite kernels by combining our best SMILES-based similarity functions with the SIMCOMP kernel. With this study, we provided a comparison of 13 different ligand similarity functions, each of which utilizes the SMILES string of molecule representation. Additionally, TF and TF-IDF based cosine similarity kernels are proposed. CONCLUSION: The more efficient SMILES-based similarity functions performed similarly to the more complex 2D-based SIMCOMP kernel in terms of AUC-ROC scores. The TF-IDF based cosine similarity obtained a better AUC-PR score than the SIMCOMP kernel on the GPCR benchmark data set. The composite kernel of TF-IDF based cosine similarity and SIMCOMP achieved the best AUC-PR scores for all data sets. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (doi:10.1186/s12859-016-0977-x) contains supplementary material, which is available to authorized users. BioMed Central 2016-03-18 /pmc/articles/PMC4797122/ /pubmed/26987649 http://dx.doi.org/10.1186/s12859-016-0977-x Text en © Öztürk et al. 2016 Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver(http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
spellingShingle Research Article
Öztürk, Hakime
Ozkirimli, Elif
Özgür, Arzucan
A comparative study of SMILES-based compound similarity functions for drug-target interaction prediction
title A comparative study of SMILES-based compound similarity functions for drug-target interaction prediction
title_full A comparative study of SMILES-based compound similarity functions for drug-target interaction prediction
title_fullStr A comparative study of SMILES-based compound similarity functions for drug-target interaction prediction
title_full_unstemmed A comparative study of SMILES-based compound similarity functions for drug-target interaction prediction
title_short A comparative study of SMILES-based compound similarity functions for drug-target interaction prediction
title_sort comparative study of smiles-based compound similarity functions for drug-target interaction prediction
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4797122/
https://www.ncbi.nlm.nih.gov/pubmed/26987649
http://dx.doi.org/10.1186/s12859-016-0977-x
work_keys_str_mv AT ozturkhakime acomparativestudyofsmilesbasedcompoundsimilarityfunctionsfordrugtargetinteractionprediction
AT ozkirimlielif acomparativestudyofsmilesbasedcompoundsimilarityfunctionsfordrugtargetinteractionprediction
AT ozgurarzucan acomparativestudyofsmilesbasedcompoundsimilarityfunctionsfordrugtargetinteractionprediction
AT ozturkhakime comparativestudyofsmilesbasedcompoundsimilarityfunctionsfordrugtargetinteractionprediction
AT ozkirimlielif comparativestudyofsmilesbasedcompoundsimilarityfunctionsfordrugtargetinteractionprediction
AT ozgurarzucan comparativestudyofsmilesbasedcompoundsimilarityfunctionsfordrugtargetinteractionprediction