Cargando…

Improving Measures of Chemical Structural Similarity Using Machine Learning on Chemical–Genetic Interactions

[Image: see text] A common strategy for identifying molecules likely to possess a desired biological activity is to search large databases of compounds for high structural similarity to a query molecule that demonstrates this activity, under the assumption that structural similarity is predictive of...

Descripción completa

Detalles Bibliográficos
Autores principales: Safizadeh, Hamid, Simpkins, Scott W., Nelson, Justin, Li, Sheena C., Piotrowski, Jeff S., Yoshimura, Mami, Yashiroda, Yoko, Hirano, Hiroyuki, Osada, Hiroyuki, Yoshida, Minoru, Boone, Charles, Myers, Chad L.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2021
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8479812/
https://www.ncbi.nlm.nih.gov/pubmed/34318674
http://dx.doi.org/10.1021/acs.jcim.0c00993
_version_ 1784576339025592320
author Safizadeh, Hamid
Simpkins, Scott W.
Nelson, Justin
Li, Sheena C.
Piotrowski, Jeff S.
Yoshimura, Mami
Yashiroda, Yoko
Hirano, Hiroyuki
Osada, Hiroyuki
Yoshida, Minoru
Boone, Charles
Myers, Chad L.
author_facet Safizadeh, Hamid
Simpkins, Scott W.
Nelson, Justin
Li, Sheena C.
Piotrowski, Jeff S.
Yoshimura, Mami
Yashiroda, Yoko
Hirano, Hiroyuki
Osada, Hiroyuki
Yoshida, Minoru
Boone, Charles
Myers, Chad L.
author_sort Safizadeh, Hamid
collection PubMed
description [Image: see text] A common strategy for identifying molecules likely to possess a desired biological activity is to search large databases of compounds for high structural similarity to a query molecule that demonstrates this activity, under the assumption that structural similarity is predictive of similar biological activity. However, efforts to systematically benchmark the diverse array of available molecular fingerprints and similarity coefficients have been limited by a lack of large-scale datasets that reflect biological similarities of compounds. To elucidate the relative performance of these alternatives, we systematically benchmarked 11 different molecular fingerprint encodings, each combined with 13 different similarity coefficients, using a large set of chemical–genetic interaction data from the yeast Saccharomyces cerevisiae as a systematic proxy for biological activity. We found that the performance of different molecular fingerprints and similarity coefficients varied substantially and that the all-shortest path fingerprints paired with the Braun-Blanquet similarity coefficient provided superior performance that was robust across several compound collections. We further proposed a machine learning pipeline based on support vector machines that offered a fivefold improvement relative to the best unsupervised approach. Our results generally suggest that using high-dimensional chemical–genetic data as a basis for refining molecular fingerprints can be a powerful approach for improving prediction of biological functions from chemical structures.
format Online
Article
Text
id pubmed-8479812
institution National Center for Biotechnology Information
language English
publishDate 2021
publisher American Chemical Society
record_format MEDLINE/PubMed
spelling pubmed-84798122021-09-30 Improving Measures of Chemical Structural Similarity Using Machine Learning on Chemical–Genetic Interactions Safizadeh, Hamid Simpkins, Scott W. Nelson, Justin Li, Sheena C. Piotrowski, Jeff S. Yoshimura, Mami Yashiroda, Yoko Hirano, Hiroyuki Osada, Hiroyuki Yoshida, Minoru Boone, Charles Myers, Chad L. J Chem Inf Model [Image: see text] A common strategy for identifying molecules likely to possess a desired biological activity is to search large databases of compounds for high structural similarity to a query molecule that demonstrates this activity, under the assumption that structural similarity is predictive of similar biological activity. However, efforts to systematically benchmark the diverse array of available molecular fingerprints and similarity coefficients have been limited by a lack of large-scale datasets that reflect biological similarities of compounds. To elucidate the relative performance of these alternatives, we systematically benchmarked 11 different molecular fingerprint encodings, each combined with 13 different similarity coefficients, using a large set of chemical–genetic interaction data from the yeast Saccharomyces cerevisiae as a systematic proxy for biological activity. We found that the performance of different molecular fingerprints and similarity coefficients varied substantially and that the all-shortest path fingerprints paired with the Braun-Blanquet similarity coefficient provided superior performance that was robust across several compound collections. We further proposed a machine learning pipeline based on support vector machines that offered a fivefold improvement relative to the best unsupervised approach. Our results generally suggest that using high-dimensional chemical–genetic data as a basis for refining molecular fingerprints can be a powerful approach for improving prediction of biological functions from chemical structures. American Chemical Society 2021-07-28 2021-09-27 /pmc/articles/PMC8479812/ /pubmed/34318674 http://dx.doi.org/10.1021/acs.jcim.0c00993 Text en © 2021 The Authors. Published by American Chemical Society https://creativecommons.org/licenses/by-nc-nd/4.0/Permits non-commercial access and re-use, provided that author attribution and integrity are maintained; but does not permit creation of adaptations or other derivative works (https://creativecommons.org/licenses/by-nc-nd/4.0/).
spellingShingle Safizadeh, Hamid
Simpkins, Scott W.
Nelson, Justin
Li, Sheena C.
Piotrowski, Jeff S.
Yoshimura, Mami
Yashiroda, Yoko
Hirano, Hiroyuki
Osada, Hiroyuki
Yoshida, Minoru
Boone, Charles
Myers, Chad L.
Improving Measures of Chemical Structural Similarity Using Machine Learning on Chemical–Genetic Interactions
title Improving Measures of Chemical Structural Similarity Using Machine Learning on Chemical–Genetic Interactions
title_full Improving Measures of Chemical Structural Similarity Using Machine Learning on Chemical–Genetic Interactions
title_fullStr Improving Measures of Chemical Structural Similarity Using Machine Learning on Chemical–Genetic Interactions
title_full_unstemmed Improving Measures of Chemical Structural Similarity Using Machine Learning on Chemical–Genetic Interactions
title_short Improving Measures of Chemical Structural Similarity Using Machine Learning on Chemical–Genetic Interactions
title_sort improving measures of chemical structural similarity using machine learning on chemical–genetic interactions
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8479812/
https://www.ncbi.nlm.nih.gov/pubmed/34318674
http://dx.doi.org/10.1021/acs.jcim.0c00993
work_keys_str_mv AT safizadehhamid improvingmeasuresofchemicalstructuralsimilarityusingmachinelearningonchemicalgeneticinteractions
AT simpkinsscottw improvingmeasuresofchemicalstructuralsimilarityusingmachinelearningonchemicalgeneticinteractions
AT nelsonjustin improvingmeasuresofchemicalstructuralsimilarityusingmachinelearningonchemicalgeneticinteractions
AT lisheenac improvingmeasuresofchemicalstructuralsimilarityusingmachinelearningonchemicalgeneticinteractions
AT piotrowskijeffs improvingmeasuresofchemicalstructuralsimilarityusingmachinelearningonchemicalgeneticinteractions
AT yoshimuramami improvingmeasuresofchemicalstructuralsimilarityusingmachinelearningonchemicalgeneticinteractions
AT yashirodayoko improvingmeasuresofchemicalstructuralsimilarityusingmachinelearningonchemicalgeneticinteractions
AT hiranohiroyuki improvingmeasuresofchemicalstructuralsimilarityusingmachinelearningonchemicalgeneticinteractions
AT osadahiroyuki improvingmeasuresofchemicalstructuralsimilarityusingmachinelearningonchemicalgeneticinteractions
AT yoshidaminoru improvingmeasuresofchemicalstructuralsimilarityusingmachinelearningonchemicalgeneticinteractions
AT boonecharles improvingmeasuresofchemicalstructuralsimilarityusingmachinelearningonchemicalgeneticinteractions
AT myerschadl improvingmeasuresofchemicalstructuralsimilarityusingmachinelearningonchemicalgeneticinteractions