Cargando…

Developing and validating predictive decision tree models from mining chemical structural fingerprints and high–throughput screening data in PubChem

BACKGROUND: Recent advances in high-throughput screening (HTS) techniques and readily available compound libraries generated using combinatorial chemistry or derived from natural products enable the testing of millions of compounds in a matter of days. Due to the amount of information produced by HT...

Descripción completa

Detalles Bibliográficos
Autores principales:	Han, Lianyi, Wang, Yanli, Bryant, Stephen H
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2008
Materias:	Research Article
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2572623/ https://www.ncbi.nlm.nih.gov/pubmed/18817552 http://dx.doi.org/10.1186/1471-2105-9-401

_version_	1782160262409748480
author	Han, Lianyi Wang, Yanli Bryant, Stephen H
author_facet	Han, Lianyi Wang, Yanli Bryant, Stephen H
author_sort	Han, Lianyi
collection	PubMed
description	BACKGROUND: Recent advances in high-throughput screening (HTS) techniques and readily available compound libraries generated using combinatorial chemistry or derived from natural products enable the testing of millions of compounds in a matter of days. Due to the amount of information produced by HTS assays, it is a very challenging task to mine the HTS data for potential interest in drug development research. Computational approaches for the analysis of HTS results face great challenges due to the large quantity of information and significant amounts of erroneous data produced. RESULTS: In this study, Decision Trees (DT) based models were developed to discriminate compound bioactivities by using their chemical structure fingerprints provided in the PubChem system . The DT models were examined for filtering biological activity data contained in four assays deposited in the PubChem Bioassay Database including assays tested for 5HT1a agonists, antagonists, and HIV-1 RT-RNase H inhibitors. The 10-fold Cross Validation (CV) sensitivity, specificity and Matthews Correlation Coefficient (MCC) for the models are 57.2~80.5%, 97.3~99.0%, 0.4~0.5 respectively. A further evaluation was also performed for DT models built for two independent bioassays, where inhibitors for the same HIV RNase target were screened using different compound libraries, this experiment yields enrichment factor of 4.4 and 9.7. CONCLUSION: Our results suggest that the designed DT models can be used as a virtual screening technique as well as a complement to traditional approaches for hits selection.
format	Text
id	pubmed-2572623
institution	National Center for Biotechnology Information
language	English
publishDate	2008
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-25726232008-10-27 Developing and validating predictive decision tree models from mining chemical structural fingerprints and high–throughput screening data in PubChem Han, Lianyi Wang, Yanli Bryant, Stephen H BMC Bioinformatics Research Article BACKGROUND: Recent advances in high-throughput screening (HTS) techniques and readily available compound libraries generated using combinatorial chemistry or derived from natural products enable the testing of millions of compounds in a matter of days. Due to the amount of information produced by HTS assays, it is a very challenging task to mine the HTS data for potential interest in drug development research. Computational approaches for the analysis of HTS results face great challenges due to the large quantity of information and significant amounts of erroneous data produced. RESULTS: In this study, Decision Trees (DT) based models were developed to discriminate compound bioactivities by using their chemical structure fingerprints provided in the PubChem system . The DT models were examined for filtering biological activity data contained in four assays deposited in the PubChem Bioassay Database including assays tested for 5HT1a agonists, antagonists, and HIV-1 RT-RNase H inhibitors. The 10-fold Cross Validation (CV) sensitivity, specificity and Matthews Correlation Coefficient (MCC) for the models are 57.2~80.5%, 97.3~99.0%, 0.4~0.5 respectively. A further evaluation was also performed for DT models built for two independent bioassays, where inhibitors for the same HIV RNase target were screened using different compound libraries, this experiment yields enrichment factor of 4.4 and 9.7. CONCLUSION: Our results suggest that the designed DT models can be used as a virtual screening technique as well as a complement to traditional approaches for hits selection. BioMed Central 2008-09-25 /pmc/articles/PMC2572623/ /pubmed/18817552 http://dx.doi.org/10.1186/1471-2105-9-401 Text en Copyright © 2008 Han et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License ( (http://creativecommons.org/licenses/by/2.0) ), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Research Article Han, Lianyi Wang, Yanli Bryant, Stephen H Developing and validating predictive decision tree models from mining chemical structural fingerprints and high–throughput screening data in PubChem
title	Developing and validating predictive decision tree models from mining chemical structural fingerprints and high–throughput screening data in PubChem
title_full	Developing and validating predictive decision tree models from mining chemical structural fingerprints and high–throughput screening data in PubChem
title_fullStr	Developing and validating predictive decision tree models from mining chemical structural fingerprints and high–throughput screening data in PubChem
title_full_unstemmed	Developing and validating predictive decision tree models from mining chemical structural fingerprints and high–throughput screening data in PubChem
title_short	Developing and validating predictive decision tree models from mining chemical structural fingerprints and high–throughput screening data in PubChem
title_sort	developing and validating predictive decision tree models from mining chemical structural fingerprints and high–throughput screening data in pubchem
topic	Research Article
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2572623/ https://www.ncbi.nlm.nih.gov/pubmed/18817552 http://dx.doi.org/10.1186/1471-2105-9-401
work_keys_str_mv	AT hanlianyi developingandvalidatingpredictivedecisiontreemodelsfromminingchemicalstructuralfingerprintsandhighthroughputscreeningdatainpubchem AT wangyanli developingandvalidatingpredictivedecisiontreemodelsfromminingchemicalstructuralfingerprintsandhighthroughputscreeningdatainpubchem AT bryantstephenh developingandvalidatingpredictivedecisiontreemodelsfromminingchemicalstructuralfingerprintsandhighthroughputscreeningdatainpubchem

Developing and validating predictive decision tree models from mining chemical structural fingerprints and high–throughput screening data in PubChem

Ejemplares similares