Cargando…

QSAR Modeling of Imbalanced High-Throughput Screening Data in PubChem

[Image: see text] Many of the structures in PubChem are annotated with activities determined in high-throughput screening (HTS) assays. Because of the nature of these assays, the activity data are typically strongly imbalanced, with a small number of active compounds contrasting with a very large nu...

Descripción completa

Detalles Bibliográficos
Autores principales: Zakharov, Alexey V., Peach, Megan L., Sitzmann, Markus, Nicklaus, Marc C.
Formato: Online Artículo Texto
Lenguaje:English
Publicado: American Chemical Society 2014
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3985743/
https://www.ncbi.nlm.nih.gov/pubmed/24524735
http://dx.doi.org/10.1021/ci400737s
_version_ 1782311621014585344
author Zakharov, Alexey V.
Peach, Megan L.
Sitzmann, Markus
Nicklaus, Marc C.
author_facet Zakharov, Alexey V.
Peach, Megan L.
Sitzmann, Markus
Nicklaus, Marc C.
author_sort Zakharov, Alexey V.
collection PubMed
description [Image: see text] Many of the structures in PubChem are annotated with activities determined in high-throughput screening (HTS) assays. Because of the nature of these assays, the activity data are typically strongly imbalanced, with a small number of active compounds contrasting with a very large number of inactive compounds. We have used several such imbalanced PubChem HTS assays to test and develop strategies to efficiently build robust QSAR models from imbalanced data sets. Different descriptor types [Quantitative Neighborhoods of Atoms (QNA) and “biological” descriptors] were used to generate a variety of QSAR models in the program GUSAR. The models obtained were compared using external test and validation sets. We also report on our efforts to incorporate the most predictive of our models in the publicly available NCI/CADD Group Web services (http://cactus.nci.nih.gov/chemical/apps/cap).
format Online
Article
Text
id pubmed-3985743
institution National Center for Biotechnology Information
language English
publishDate 2014
publisher American Chemical Society
record_format MEDLINE/PubMed
spelling pubmed-39857432015-02-13 QSAR Modeling of Imbalanced High-Throughput Screening Data in PubChem Zakharov, Alexey V. Peach, Megan L. Sitzmann, Markus Nicklaus, Marc C. J Chem Inf Model [Image: see text] Many of the structures in PubChem are annotated with activities determined in high-throughput screening (HTS) assays. Because of the nature of these assays, the activity data are typically strongly imbalanced, with a small number of active compounds contrasting with a very large number of inactive compounds. We have used several such imbalanced PubChem HTS assays to test and develop strategies to efficiently build robust QSAR models from imbalanced data sets. Different descriptor types [Quantitative Neighborhoods of Atoms (QNA) and “biological” descriptors] were used to generate a variety of QSAR models in the program GUSAR. The models obtained were compared using external test and validation sets. We also report on our efforts to incorporate the most predictive of our models in the publicly available NCI/CADD Group Web services (http://cactus.nci.nih.gov/chemical/apps/cap). American Chemical Society 2014-02-13 2014-03-24 /pmc/articles/PMC3985743/ /pubmed/24524735 http://dx.doi.org/10.1021/ci400737s Text en Copyright © 2014 American Chemical Society
spellingShingle Zakharov, Alexey V.
Peach, Megan L.
Sitzmann, Markus
Nicklaus, Marc C.
QSAR Modeling of Imbalanced High-Throughput Screening Data in PubChem
title QSAR Modeling of Imbalanced High-Throughput Screening Data in PubChem
title_full QSAR Modeling of Imbalanced High-Throughput Screening Data in PubChem
title_fullStr QSAR Modeling of Imbalanced High-Throughput Screening Data in PubChem
title_full_unstemmed QSAR Modeling of Imbalanced High-Throughput Screening Data in PubChem
title_short QSAR Modeling of Imbalanced High-Throughput Screening Data in PubChem
title_sort qsar modeling of imbalanced high-throughput screening data in pubchem
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3985743/
https://www.ncbi.nlm.nih.gov/pubmed/24524735
http://dx.doi.org/10.1021/ci400737s
work_keys_str_mv AT zakharovalexeyv qsarmodelingofimbalancedhighthroughputscreeningdatainpubchem
AT peachmeganl qsarmodelingofimbalancedhighthroughputscreeningdatainpubchem
AT sitzmannmarkus qsarmodelingofimbalancedhighthroughputscreeningdatainpubchem
AT nicklausmarcc qsarmodelingofimbalancedhighthroughputscreeningdatainpubchem