Cargando…

Active learning strategies with COMBINE analysis: new tricks for an old dog

The COMBINE method was designed to study congeneric series of compounds including structural information of ligand–protein complexes. Although very successful, the method has not received the same level of attention than other alternatives to study Quantitative Structure Active Relationships (QSAR)...

Descripción completa

Detalles Bibliográficos
Autores principales: Fusani, Lucia, Cabrera, Alvaro Cortes
Formato: Online Artículo Texto
Lenguaje:English
Publicado: Springer International Publishing 2018
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7087723/
https://www.ncbi.nlm.nih.gov/pubmed/30564994
http://dx.doi.org/10.1007/s10822-018-0181-3
_version_ 1783509390470414336
author Fusani, Lucia
Cabrera, Alvaro Cortes
author_facet Fusani, Lucia
Cabrera, Alvaro Cortes
author_sort Fusani, Lucia
collection PubMed
description The COMBINE method was designed to study congeneric series of compounds including structural information of ligand–protein complexes. Although very successful, the method has not received the same level of attention than other alternatives to study Quantitative Structure Active Relationships (QSAR) mainly because lack of ways to measure the uncertainty of the predictions and the need for large datasets. Active learning, a semi-supervised learning approach that makes use of uncertainty to enhance models’ performance while reducing the size of the training sets, has been used in this work to address both problems. We propose two estimators of uncertainty: the pool of regressors and the distance to the training set. The performance of the methods has been evaluated by testing the resulting active learning workflows in 3 diverse datasets: HIV-1 protease inhibitors, Taxol-derivatives and BRD4 inhibitors. The proposed strategies were successful in 80% of the cases for the taxol-derivatives and BRD4 inhibitors, while outperformed random selection in the case of the HIV-1 protease inhibitors time-split. Our results suggest that AL-COMBINE might be an effective way of producing consistently superior QSAR models with a limited number of samples. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1007/s10822-018-0181-3) contains supplementary material, which is available to authorized users.
format Online
Article
Text
id pubmed-7087723
institution National Center for Biotechnology Information
language English
publishDate 2018
publisher Springer International Publishing
record_format MEDLINE/PubMed
spelling pubmed-70877232020-03-23 Active learning strategies with COMBINE analysis: new tricks for an old dog Fusani, Lucia Cabrera, Alvaro Cortes J Comput Aided Mol Des Article The COMBINE method was designed to study congeneric series of compounds including structural information of ligand–protein complexes. Although very successful, the method has not received the same level of attention than other alternatives to study Quantitative Structure Active Relationships (QSAR) mainly because lack of ways to measure the uncertainty of the predictions and the need for large datasets. Active learning, a semi-supervised learning approach that makes use of uncertainty to enhance models’ performance while reducing the size of the training sets, has been used in this work to address both problems. We propose two estimators of uncertainty: the pool of regressors and the distance to the training set. The performance of the methods has been evaluated by testing the resulting active learning workflows in 3 diverse datasets: HIV-1 protease inhibitors, Taxol-derivatives and BRD4 inhibitors. The proposed strategies were successful in 80% of the cases for the taxol-derivatives and BRD4 inhibitors, while outperformed random selection in the case of the HIV-1 protease inhibitors time-split. Our results suggest that AL-COMBINE might be an effective way of producing consistently superior QSAR models with a limited number of samples. ELECTRONIC SUPPLEMENTARY MATERIAL: The online version of this article (10.1007/s10822-018-0181-3) contains supplementary material, which is available to authorized users. Springer International Publishing 2018-12-18 2019 /pmc/articles/PMC7087723/ /pubmed/30564994 http://dx.doi.org/10.1007/s10822-018-0181-3 Text en © Springer Nature Switzerland AG 2018 This article is made available via the PMC Open Access Subset for unrestricted research re-use and secondary analysis in any form or by any means with acknowledgement of the original source. These permissions are granted for the duration of the World Health Organization (WHO) declaration of COVID-19 as a global pandemic.
spellingShingle Article
Fusani, Lucia
Cabrera, Alvaro Cortes
Active learning strategies with COMBINE analysis: new tricks for an old dog
title Active learning strategies with COMBINE analysis: new tricks for an old dog
title_full Active learning strategies with COMBINE analysis: new tricks for an old dog
title_fullStr Active learning strategies with COMBINE analysis: new tricks for an old dog
title_full_unstemmed Active learning strategies with COMBINE analysis: new tricks for an old dog
title_short Active learning strategies with COMBINE analysis: new tricks for an old dog
title_sort active learning strategies with combine analysis: new tricks for an old dog
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7087723/
https://www.ncbi.nlm.nih.gov/pubmed/30564994
http://dx.doi.org/10.1007/s10822-018-0181-3
work_keys_str_mv AT fusanilucia activelearningstrategieswithcombineanalysisnewtricksforanolddog
AT cabreraalvarocortes activelearningstrategieswithcombineanalysisnewtricksforanolddog