Cargando…

Classifying and scoring of molecules with the NGN: new datasets, significance tests, and generalization

This paper demonstrates how a Neural Grammar Network learns to classify and score molecules for a variety of tasks in chemistry and toxicology. In addition to a more detailed analysis on datasets previously studied, we introduce three new datasets (BBB, FXa, and toxicology) to show the generality of...

Descripción completa

Detalles Bibliográficos
Autores principales:	Ma, Eddie YT, Cameron, Christopher JF, Kremer, Stefan C
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2010
Materias:	Research
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2966291/ https://www.ncbi.nlm.nih.gov/pubmed/21034429 http://dx.doi.org/10.1186/1471-2105-11-S8-S4

_version_	1782189566857314304
author	Ma, Eddie YT Cameron, Christopher JF Kremer, Stefan C
author_facet	Ma, Eddie YT Cameron, Christopher JF Kremer, Stefan C
author_sort	Ma, Eddie YT
collection	PubMed
description	This paper demonstrates how a Neural Grammar Network learns to classify and score molecules for a variety of tasks in chemistry and toxicology. In addition to a more detailed analysis on datasets previously studied, we introduce three new datasets (BBB, FXa, and toxicology) to show the generality of the approach. A new experimental methodology is developed and applied to both the new datasets as well as previously studied datasets. This methodology is rigorous and statistically grounded, and ultimately culminates in a Wilcoxon significance test that proves the effectiveness of the system. We further include a complete generalization of the specific technique to arbitrary grammars and datasets using a mathematical abstraction that allows researchers in different domains to apply the method to their own work. BACKGROUND: Our work can be viewed as an alternative to existing methods to solve the quantitative structure-activity relationship (QSAR) problem. To this end, we review a number approaches both from a methodological and also a performance perspective. In addition to these approaches, we also examined a number of chemical properties that can be used by generic classifier systems, such as feed-forward artificial neural networks. In studying these approaches, we identified a set of interesting benchmark problem sets to which many of the above approaches had been applied. These included: ACE, AChE, AR, BBB, BZR, Cox2, DHFR, ER, FXa, GPB, Therm, and Thr. Finally, we developed our own benchmark set by collecting data on toxicology. RESULTS: Our results show that our system performs better than, or comparatively to, the existing methods over a broad range of problem types. Our method does not require the expert knowledge that is necessary to apply the other methods to novel problems. CONCLUSIONS: We conclude that our success is due to the ability of our system to: 1) encode molecules losslessly before presentation to the learning system, and 2) leverage the design of molecular description languages to facilitate the identification of relevant structural attributes of the molecules over different problem domains.
format	Text
id	pubmed-2966291
institution	National Center for Biotechnology Information
language	English
publishDate	2010
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-29662912010-10-30 Classifying and scoring of molecules with the NGN: new datasets, significance tests, and generalization Ma, Eddie YT Cameron, Christopher JF Kremer, Stefan C BMC Bioinformatics Research This paper demonstrates how a Neural Grammar Network learns to classify and score molecules for a variety of tasks in chemistry and toxicology. In addition to a more detailed analysis on datasets previously studied, we introduce three new datasets (BBB, FXa, and toxicology) to show the generality of the approach. A new experimental methodology is developed and applied to both the new datasets as well as previously studied datasets. This methodology is rigorous and statistically grounded, and ultimately culminates in a Wilcoxon significance test that proves the effectiveness of the system. We further include a complete generalization of the specific technique to arbitrary grammars and datasets using a mathematical abstraction that allows researchers in different domains to apply the method to their own work. BACKGROUND: Our work can be viewed as an alternative to existing methods to solve the quantitative structure-activity relationship (QSAR) problem. To this end, we review a number approaches both from a methodological and also a performance perspective. In addition to these approaches, we also examined a number of chemical properties that can be used by generic classifier systems, such as feed-forward artificial neural networks. In studying these approaches, we identified a set of interesting benchmark problem sets to which many of the above approaches had been applied. These included: ACE, AChE, AR, BBB, BZR, Cox2, DHFR, ER, FXa, GPB, Therm, and Thr. Finally, we developed our own benchmark set by collecting data on toxicology. RESULTS: Our results show that our system performs better than, or comparatively to, the existing methods over a broad range of problem types. Our method does not require the expert knowledge that is necessary to apply the other methods to novel problems. CONCLUSIONS: We conclude that our success is due to the ability of our system to: 1) encode molecules losslessly before presentation to the learning system, and 2) leverage the design of molecular description languages to facilitate the identification of relevant structural attributes of the molecules over different problem domains. BioMed Central 2010-10-26 /pmc/articles/PMC2966291/ /pubmed/21034429 http://dx.doi.org/10.1186/1471-2105-11-S8-S4 Text en Copyright ©2010 Kremer et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Ma, Eddie YT Cameron, Christopher JF Kremer, Stefan C Classifying and scoring of molecules with the NGN: new datasets, significance tests, and generalization
title	Classifying and scoring of molecules with the NGN: new datasets, significance tests, and generalization
title_full	Classifying and scoring of molecules with the NGN: new datasets, significance tests, and generalization
title_fullStr	Classifying and scoring of molecules with the NGN: new datasets, significance tests, and generalization
title_full_unstemmed	Classifying and scoring of molecules with the NGN: new datasets, significance tests, and generalization
title_short	Classifying and scoring of molecules with the NGN: new datasets, significance tests, and generalization
title_sort	classifying and scoring of molecules with the ngn: new datasets, significance tests, and generalization
topic	Research
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2966291/ https://www.ncbi.nlm.nih.gov/pubmed/21034429 http://dx.doi.org/10.1186/1471-2105-11-S8-S4
work_keys_str_mv	AT maeddieyt classifyingandscoringofmoleculeswiththengnnewdatasetssignificancetestsandgeneralization AT cameronchristopherjf classifyingandscoringofmoleculeswiththengnnewdatasetssignificancetestsandgeneralization AT kremerstefanc classifyingandscoringofmoleculeswiththengnnewdatasetssignificancetestsandgeneralization

Classifying and scoring of molecules with the NGN: new datasets, significance tests, and generalization

Ejemplares similares