Cargando…

A semi-supervised learning framework for quantitative structure–activity regression modelling

MOTIVATION: Quantitative structure–activity relationship (QSAR) methods are increasingly used in assisting the process of preclinical, small molecule drug discovery. Regression models are trained on data consisting of a finite-dimensional representation of molecular structures and their correspondin...

Descripción completa

Detalles Bibliográficos
Autores principales:	Watson, Oliver, Cortes-Ciriano, Isidro, Watson, James A
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	Oxford University Press 2020
Materias:	Original Papers
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8058768/ https://www.ncbi.nlm.nih.gov/pubmed/32777821 http://dx.doi.org/10.1093/bioinformatics/btaa711

_version_	1783681077350498304
author	Watson, Oliver Cortes-Ciriano, Isidro Watson, James A
author_facet	Watson, Oliver Cortes-Ciriano, Isidro Watson, James A
author_sort	Watson, Oliver
collection	PubMed
description	MOTIVATION: Quantitative structure–activity relationship (QSAR) methods are increasingly used in assisting the process of preclinical, small molecule drug discovery. Regression models are trained on data consisting of a finite-dimensional representation of molecular structures and their corresponding target-specific activities. These supervised learning models can then be used to predict the activity of previously unmeasured novel compounds. RESULTS: This work provides methods that solve three problems in QSAR modelling: (i) a method for comparing the information content between finite-dimensional representations of molecular structures (fingerprints) with respect to the target of interest, (ii) a method that quantifies how the accuracy of the model prediction degrades as a function of the distance between the testing and training data and (iii) a method to adjust for screening dependent selection bias inherent in many training datasets. For example, in the most extreme cases, only compounds which pass an activity-dependent screening threshold are reported. A semi-supervised learning framework combines (ii) and (iii) and can make predictions, which take into account the similarity of the testing compounds to those in the training data and adjust for the reporting selection bias. We illustrate the three methods using publicly available structure–activity data for a large set of compounds reported by GlaxoSmithKline (the Tres Cantos AntiMalarial Set, TCAMS) to inhibit asexual in vitro Plasmodium falciparum growth. AVAILABILITYAND IMPLEMENTATION: https://github.com/owatson/PenalizedPrediction. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online.
format	Online Article Text
id	pubmed-8058768
institution	National Center for Biotechnology Information
language	English
publishDate	2020
publisher	Oxford University Press
record_format	MEDLINE/PubMed
spelling	pubmed-80587682021-04-28 A semi-supervised learning framework for quantitative structure–activity regression modelling Watson, Oliver Cortes-Ciriano, Isidro Watson, James A Bioinformatics Original Papers MOTIVATION: Quantitative structure–activity relationship (QSAR) methods are increasingly used in assisting the process of preclinical, small molecule drug discovery. Regression models are trained on data consisting of a finite-dimensional representation of molecular structures and their corresponding target-specific activities. These supervised learning models can then be used to predict the activity of previously unmeasured novel compounds. RESULTS: This work provides methods that solve three problems in QSAR modelling: (i) a method for comparing the information content between finite-dimensional representations of molecular structures (fingerprints) with respect to the target of interest, (ii) a method that quantifies how the accuracy of the model prediction degrades as a function of the distance between the testing and training data and (iii) a method to adjust for screening dependent selection bias inherent in many training datasets. For example, in the most extreme cases, only compounds which pass an activity-dependent screening threshold are reported. A semi-supervised learning framework combines (ii) and (iii) and can make predictions, which take into account the similarity of the testing compounds to those in the training data and adjust for the reporting selection bias. We illustrate the three methods using publicly available structure–activity data for a large set of compounds reported by GlaxoSmithKline (the Tres Cantos AntiMalarial Set, TCAMS) to inhibit asexual in vitro Plasmodium falciparum growth. AVAILABILITYAND IMPLEMENTATION: https://github.com/owatson/PenalizedPrediction. SUPPLEMENTARY INFORMATION: Supplementary data are available at Bioinformatics online. Oxford University Press 2020-08-10 /pmc/articles/PMC8058768/ /pubmed/32777821 http://dx.doi.org/10.1093/bioinformatics/btaa711 Text en © The Author(s) 2020. Published by Oxford University Press. https://creativecommons.org/licenses/by/4.0/This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/ (https://creativecommons.org/licenses/by/4.0/) ), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Original Papers Watson, Oliver Cortes-Ciriano, Isidro Watson, James A A semi-supervised learning framework for quantitative structure–activity regression modelling
title	A semi-supervised learning framework for quantitative structure–activity regression modelling
title_full	A semi-supervised learning framework for quantitative structure–activity regression modelling
title_fullStr	A semi-supervised learning framework for quantitative structure–activity regression modelling
title_full_unstemmed	A semi-supervised learning framework for quantitative structure–activity regression modelling
title_short	A semi-supervised learning framework for quantitative structure–activity regression modelling
title_sort	semi-supervised learning framework for quantitative structure–activity regression modelling
topic	Original Papers
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8058768/ https://www.ncbi.nlm.nih.gov/pubmed/32777821 http://dx.doi.org/10.1093/bioinformatics/btaa711
work_keys_str_mv	AT watsonoliver asemisupervisedlearningframeworkforquantitativestructureactivityregressionmodelling AT cortescirianoisidro asemisupervisedlearningframeworkforquantitativestructureactivityregressionmodelling AT watsonjamesa asemisupervisedlearningframeworkforquantitativestructureactivityregressionmodelling AT watsonoliver semisupervisedlearningframeworkforquantitativestructureactivityregressionmodelling AT cortescirianoisidro semisupervisedlearningframeworkforquantitativestructureactivityregressionmodelling AT watsonjamesa semisupervisedlearningframeworkforquantitativestructureactivityregressionmodelling

A semi-supervised learning framework for quantitative structure–activity regression modelling

Ejemplares similares