Cargando…

Additive SMILES-Based Carcinogenicity Models: Probabilistic Principles in the Search for Robust Predictions

Optimal descriptors calculated with the simplified molecular input line entry system (SMILES) have been utilized in modeling of carcinogenicity as continuous values (logTD(50)). These descriptors can be calculated using correlation weights of SMILES attributes calculated by the Monte Carlo method. A...

Descripción completa

Detalles Bibliográficos
Autores principales:	Toropov, Andrey A., Toropova, Alla P., Benfenati, Emilio
Formato:	Texto
Lenguaje:	English
Publicado:	Molecular Diversity Preservation International (MDPI) 2009
Materias:	Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2738914/ https://www.ncbi.nlm.nih.gov/pubmed/19742127 http://dx.doi.org/10.3390/ijms10073106

_version_	1782171550599872512
author	Toropov, Andrey A. Toropova, Alla P. Benfenati, Emilio
author_facet	Toropov, Andrey A. Toropova, Alla P. Benfenati, Emilio
author_sort	Toropov, Andrey A.
collection	PubMed
description	Optimal descriptors calculated with the simplified molecular input line entry system (SMILES) have been utilized in modeling of carcinogenicity as continuous values (logTD(50)). These descriptors can be calculated using correlation weights of SMILES attributes calculated by the Monte Carlo method. A considerable subset of these attributes includes rare attributes. The use of these rare attributes can lead to overtraining. One can avoid the influence of the rare attributes if their correlation weights are fixed to zero. A function, limS, has been defined to identify rare attributes. The limS defines the minimum number of occurrences in the set of structures of the training (subtraining) set, to accept attributes as usable. If an attribute is present less than limS, it is considered “rare”, and thus not used. Two systems of building up models were examined: 1. classic training-test system; 2. balance of correlations for the subtraining and calibration sets (together, they are the original training set: the function of the calibration set is imitation of a preliminary test set). Three random splits into subtraining, calibration, and test sets were analysed. Comparison of abovementioned systems has shown that balance of correlations gives more robust prediction of the carcinogenicity for all three splits (split 1: r(test)(2)=0.7514, s(test)=0.684; split 2: r(test)(2)=0.7998, s(test)=0.600; split 3: r(test)(2)=0.7192, s(test)=0.728).
format	Text
id	pubmed-2738914
institution	National Center for Biotechnology Information
language	English
publishDate	2009
publisher	Molecular Diversity Preservation International (MDPI)
record_format	MEDLINE/PubMed
spelling	pubmed-27389142009-09-08 Additive SMILES-Based Carcinogenicity Models: Probabilistic Principles in the Search for Robust Predictions Toropov, Andrey A. Toropova, Alla P. Benfenati, Emilio Int J Mol Sci Article Optimal descriptors calculated with the simplified molecular input line entry system (SMILES) have been utilized in modeling of carcinogenicity as continuous values (logTD(50)). These descriptors can be calculated using correlation weights of SMILES attributes calculated by the Monte Carlo method. A considerable subset of these attributes includes rare attributes. The use of these rare attributes can lead to overtraining. One can avoid the influence of the rare attributes if their correlation weights are fixed to zero. A function, limS, has been defined to identify rare attributes. The limS defines the minimum number of occurrences in the set of structures of the training (subtraining) set, to accept attributes as usable. If an attribute is present less than limS, it is considered “rare”, and thus not used. Two systems of building up models were examined: 1. classic training-test system; 2. balance of correlations for the subtraining and calibration sets (together, they are the original training set: the function of the calibration set is imitation of a preliminary test set). Three random splits into subtraining, calibration, and test sets were analysed. Comparison of abovementioned systems has shown that balance of correlations gives more robust prediction of the carcinogenicity for all three splits (split 1: r(test)(2)=0.7514, s(test)=0.684; split 2: r(test)(2)=0.7998, s(test)=0.600; split 3: r(test)(2)=0.7192, s(test)=0.728). Molecular Diversity Preservation International (MDPI) 2009-07-08 /pmc/articles/PMC2738914/ /pubmed/19742127 http://dx.doi.org/10.3390/ijms10073106 Text en © 2009 by the authors; licensee Molecular Diversity Preservation International, Basel, Switzerland. http://creativecommons.org/licenses/by/3.0 This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).
spellingShingle	Article Toropov, Andrey A. Toropova, Alla P. Benfenati, Emilio Additive SMILES-Based Carcinogenicity Models: Probabilistic Principles in the Search for Robust Predictions
title	Additive SMILES-Based Carcinogenicity Models: Probabilistic Principles in the Search for Robust Predictions
title_full	Additive SMILES-Based Carcinogenicity Models: Probabilistic Principles in the Search for Robust Predictions
title_fullStr	Additive SMILES-Based Carcinogenicity Models: Probabilistic Principles in the Search for Robust Predictions
title_full_unstemmed	Additive SMILES-Based Carcinogenicity Models: Probabilistic Principles in the Search for Robust Predictions
title_short	Additive SMILES-Based Carcinogenicity Models: Probabilistic Principles in the Search for Robust Predictions
title_sort	additive smiles-based carcinogenicity models: probabilistic principles in the search for robust predictions
topic	Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2738914/ https://www.ncbi.nlm.nih.gov/pubmed/19742127 http://dx.doi.org/10.3390/ijms10073106
work_keys_str_mv	AT toropovandreya additivesmilesbasedcarcinogenicitymodelsprobabilisticprinciplesinthesearchforrobustpredictions AT toropovaallap additivesmilesbasedcarcinogenicitymodelsprobabilisticprinciplesinthesearchforrobustpredictions AT benfenatiemilio additivesmilesbasedcarcinogenicitymodelsprobabilisticprinciplesinthesearchforrobustpredictions

Additive SMILES-Based Carcinogenicity Models: Probabilistic Principles in the Search for Robust Predictions

Ejemplares similares