Cargando…
Evaluation of categorical matrix completion algorithms: toward improved active learning for drug discovery
MOTIVATION: High throughput and high content screening are extensively used to determine the effect of small molecule compounds and other potential therapeutics upon particular targets as part of the early drug development process. However, screening is typically used to find compounds that have a d...
Autores principales: | , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
Oxford University Press
2021
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8545350/ https://www.ncbi.nlm.nih.gov/pubmed/33983377 http://dx.doi.org/10.1093/bioinformatics/btab322 |
_version_ | 1784589997976846336 |
---|---|
author | Sun, Huangqingbo Murphy, Robert F |
author_facet | Sun, Huangqingbo Murphy, Robert F |
author_sort | Sun, Huangqingbo |
collection | PubMed |
description | MOTIVATION: High throughput and high content screening are extensively used to determine the effect of small molecule compounds and other potential therapeutics upon particular targets as part of the early drug development process. However, screening is typically used to find compounds that have a desired effect but not to identify potential undesirable side effects. This is because the size of the search space precludes measuring the potential effect of all compounds on all targets. Active machine learning has been proposed as a solution to this problem. RESULTS: In this article, we describe an improved imputation method, Impute by Committee, for completion of matrices containing categorical values. We compare this method to existing approaches in the context of modeling the effects of many compounds on many targets using latent similarities between compounds and conditions. We also compare these methods for the task of driving active learning in well-characterized settings for synthetic and real datasets. Our new approach performed the best overall both in the accuracy of matrix completion itself and in the number of experiments needed to train an accurate predictive model compared to random selection of experiments. We further improved upon the performance of our new method by developing an adaptive switching strategy for active learning that iteratively chooses between different matrix completion methods. AVAILABILITY AND IMPLEMENTATION: A Reproducible Research Archive containing all data and code is available at http://murphylab.cbd.cmu.edu/software. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. |
format | Online Article Text |
id | pubmed-8545350 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2021 |
publisher | Oxford University Press |
record_format | MEDLINE/PubMed |
spelling | pubmed-85453502021-10-26 Evaluation of categorical matrix completion algorithms: toward improved active learning for drug discovery Sun, Huangqingbo Murphy, Robert F Bioinformatics Original Papers MOTIVATION: High throughput and high content screening are extensively used to determine the effect of small molecule compounds and other potential therapeutics upon particular targets as part of the early drug development process. However, screening is typically used to find compounds that have a desired effect but not to identify potential undesirable side effects. This is because the size of the search space precludes measuring the potential effect of all compounds on all targets. Active machine learning has been proposed as a solution to this problem. RESULTS: In this article, we describe an improved imputation method, Impute by Committee, for completion of matrices containing categorical values. We compare this method to existing approaches in the context of modeling the effects of many compounds on many targets using latent similarities between compounds and conditions. We also compare these methods for the task of driving active learning in well-characterized settings for synthetic and real datasets. Our new approach performed the best overall both in the accuracy of matrix completion itself and in the number of experiments needed to train an accurate predictive model compared to random selection of experiments. We further improved upon the performance of our new method by developing an adaptive switching strategy for active learning that iteratively chooses between different matrix completion methods. AVAILABILITY AND IMPLEMENTATION: A Reproducible Research Archive containing all data and code is available at http://murphylab.cbd.cmu.edu/software. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2021-05-13 /pmc/articles/PMC8545350/ /pubmed/33983377 http://dx.doi.org/10.1093/bioinformatics/btab322 Text en © The Author(s) 2021. Published by Oxford University Press. https://creativecommons.org/licenses/by-nc/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (https://creativecommons.org/licenses/by-nc/4.0/), which permits non-commercial re-use, distribution, and reproduction in any medium, provided the original work is properly cited. For commercial re-use, please contact journals.permissions@oup.com |
spellingShingle | Original Papers Sun, Huangqingbo Murphy, Robert F Evaluation of categorical matrix completion algorithms: toward improved active learning for drug discovery |
title | Evaluation of categorical matrix completion algorithms: toward improved active learning for drug discovery |
title_full | Evaluation of categorical matrix completion algorithms: toward improved active learning for drug discovery |
title_fullStr | Evaluation of categorical matrix completion algorithms: toward improved active learning for drug discovery |
title_full_unstemmed | Evaluation of categorical matrix completion algorithms: toward improved active learning for drug discovery |
title_short | Evaluation of categorical matrix completion algorithms: toward improved active learning for drug discovery |
title_sort | evaluation of categorical matrix completion algorithms: toward improved active learning for drug discovery |
topic | Original Papers |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8545350/ https://www.ncbi.nlm.nih.gov/pubmed/33983377 http://dx.doi.org/10.1093/bioinformatics/btab322 |
work_keys_str_mv | AT sunhuangqingbo evaluationofcategoricalmatrixcompletionalgorithmstowardimprovedactivelearningfordrugdiscovery AT murphyrobertf evaluationofcategoricalmatrixcompletionalgorithmstowardimprovedactivelearningfordrugdiscovery |