Cargando…

In-silico predictive mutagenicity model generation using supervised learning approaches

BACKGROUND: Experimental screening of chemical compounds for biological activity is a time consuming and expensive practice. In silico predictive models permit inexpensive, rapid “virtual screening” to prioritize selection of compounds for experimental testing. Both experimental and in silico screen...

Descripción completa

Detalles Bibliográficos
Autores principales:	Seal, Abhik, Passi, Anurag, Jaleel, UC Abdul, Wild, David J
Formato:	Online Artículo Texto
Lenguaje:	English
Publicado:	BioMed Central 2012
Materias:	Methodology
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3542175/ https://www.ncbi.nlm.nih.gov/pubmed/22587596 http://dx.doi.org/10.1186/1758-2946-4-10

_version_	1782255467331846144
author	Seal, Abhik Passi, Anurag Jaleel, UC Abdul Wild, David J
author_facet	Seal, Abhik Passi, Anurag Jaleel, UC Abdul Wild, David J
author_sort	Seal, Abhik
collection	PubMed
description	BACKGROUND: Experimental screening of chemical compounds for biological activity is a time consuming and expensive practice. In silico predictive models permit inexpensive, rapid “virtual screening” to prioritize selection of compounds for experimental testing. Both experimental and in silico screening can be used to test compounds for desirable or undesirable properties. Prior work on prediction of mutagenicity has primarily involved identification of toxicophores rather than whole-molecule predictive models. In this work, we examined a range of in silico predictive classification models for prediction of mutagenic properties of compounds, including methods such as J48 and SMO which have not previously been widely applied in cheminformatics. RESULTS: The Bursi mutagenicity data set containing 4337 compounds (Set 1) and a Benchmark data set of 6512 compounds (Set 2) were taken as input data set in this work. A third data set (Set 3) was prepared by joining up the previous two sets. Classification algorithms including Naïve Bayes, Random Forest, J48 and SMO with 10 fold cross-validation and default parameters were used for model generation on these data sets. Models built using the combined performed better than those developed from the Benchmark data set. Significantly, Random Forest outperformed other classifiers for all the data sets, especially for Set 3 with 89.27% accuracy, 89% precision and ROC of 95.3%. To validate the developed models two external data sets, AID1189 and AID1194, with mutagenicity data were tested showing 62% accuracy with 67% precision and 65% ROC area and 91% accuracy, 91% precision with 96.3% ROC area respectively. A Random Forest model was used on approved drugs from DrugBank and metabolites from the Zinc Database with True Positives rate almost 85% showing the robustness of the model. CONCLUSION: We have created a new mutagenicity benchmark data set with around 8,000 compounds. Our work shows that highly accurate predictive mutagenicity models can be built using machine learning methods based on chemical descriptors and trained using this set, and these models provide a complement to toxicophores based methods. Further, our work supports other recent literature in showing that Random Forest models generally outperform other comparable machine learning methods for this kind of application.
format	Online Article Text
id	pubmed-3542175
institution	National Center for Biotechnology Information
language	English
publishDate	2012
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-35421752013-01-11 In-silico predictive mutagenicity model generation using supervised learning approaches Seal, Abhik Passi, Anurag Jaleel, UC Abdul Wild, David J J Cheminform Methodology BACKGROUND: Experimental screening of chemical compounds for biological activity is a time consuming and expensive practice. In silico predictive models permit inexpensive, rapid “virtual screening” to prioritize selection of compounds for experimental testing. Both experimental and in silico screening can be used to test compounds for desirable or undesirable properties. Prior work on prediction of mutagenicity has primarily involved identification of toxicophores rather than whole-molecule predictive models. In this work, we examined a range of in silico predictive classification models for prediction of mutagenic properties of compounds, including methods such as J48 and SMO which have not previously been widely applied in cheminformatics. RESULTS: The Bursi mutagenicity data set containing 4337 compounds (Set 1) and a Benchmark data set of 6512 compounds (Set 2) were taken as input data set in this work. A third data set (Set 3) was prepared by joining up the previous two sets. Classification algorithms including Naïve Bayes, Random Forest, J48 and SMO with 10 fold cross-validation and default parameters were used for model generation on these data sets. Models built using the combined performed better than those developed from the Benchmark data set. Significantly, Random Forest outperformed other classifiers for all the data sets, especially for Set 3 with 89.27% accuracy, 89% precision and ROC of 95.3%. To validate the developed models two external data sets, AID1189 and AID1194, with mutagenicity data were tested showing 62% accuracy with 67% precision and 65% ROC area and 91% accuracy, 91% precision with 96.3% ROC area respectively. A Random Forest model was used on approved drugs from DrugBank and metabolites from the Zinc Database with True Positives rate almost 85% showing the robustness of the model. CONCLUSION: We have created a new mutagenicity benchmark data set with around 8,000 compounds. Our work shows that highly accurate predictive mutagenicity models can be built using machine learning methods based on chemical descriptors and trained using this set, and these models provide a complement to toxicophores based methods. Further, our work supports other recent literature in showing that Random Forest models generally outperform other comparable machine learning methods for this kind of application. BioMed Central 2012-05-15 /pmc/articles/PMC3542175/ /pubmed/22587596 http://dx.doi.org/10.1186/1758-2946-4-10 Text en Copyright ©2012 Seal et al.; licensee Chemistry Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Methodology Seal, Abhik Passi, Anurag Jaleel, UC Abdul Wild, David J In-silico predictive mutagenicity model generation using supervised learning approaches
title	In-silico predictive mutagenicity model generation using supervised learning approaches
title_full	In-silico predictive mutagenicity model generation using supervised learning approaches
title_fullStr	In-silico predictive mutagenicity model generation using supervised learning approaches
title_full_unstemmed	In-silico predictive mutagenicity model generation using supervised learning approaches
title_short	In-silico predictive mutagenicity model generation using supervised learning approaches
title_sort	in-silico predictive mutagenicity model generation using supervised learning approaches
topic	Methodology
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3542175/ https://www.ncbi.nlm.nih.gov/pubmed/22587596 http://dx.doi.org/10.1186/1758-2946-4-10
work_keys_str_mv	AT sealabhik insilicopredictivemutagenicitymodelgenerationusingsupervisedlearningapproaches AT passianurag insilicopredictivemutagenicitymodelgenerationusingsupervisedlearningapproaches AT jaleelucabdul insilicopredictivemutagenicitymodelgenerationusingsupervisedlearningapproaches AT insilicopredictivemutagenicitymodelgenerationusingsupervisedlearningapproaches AT wilddavidj insilicopredictivemutagenicitymodelgenerationusingsupervisedlearningapproaches

In-silico predictive mutagenicity model generation using supervised learning approaches

Ejemplares similares