Cargando…
QSAR Modeling of Imbalanced High-Throughput Screening Data in PubChem
[Image: see text] Many of the structures in PubChem are annotated with activities determined in high-throughput screening (HTS) assays. Because of the nature of these assays, the activity data are typically strongly imbalanced, with a small number of active compounds contrasting with a very large nu...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
American Chemical
Society
2014
|
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3985743/ https://www.ncbi.nlm.nih.gov/pubmed/24524735 http://dx.doi.org/10.1021/ci400737s |
_version_ | 1782311621014585344 |
---|---|
author | Zakharov, Alexey V. Peach, Megan L. Sitzmann, Markus Nicklaus, Marc C. |
author_facet | Zakharov, Alexey V. Peach, Megan L. Sitzmann, Markus Nicklaus, Marc C. |
author_sort | Zakharov, Alexey V. |
collection | PubMed |
description | [Image: see text] Many of the structures in PubChem are annotated with activities determined in high-throughput screening (HTS) assays. Because of the nature of these assays, the activity data are typically strongly imbalanced, with a small number of active compounds contrasting with a very large number of inactive compounds. We have used several such imbalanced PubChem HTS assays to test and develop strategies to efficiently build robust QSAR models from imbalanced data sets. Different descriptor types [Quantitative Neighborhoods of Atoms (QNA) and “biological” descriptors] were used to generate a variety of QSAR models in the program GUSAR. The models obtained were compared using external test and validation sets. We also report on our efforts to incorporate the most predictive of our models in the publicly available NCI/CADD Group Web services (http://cactus.nci.nih.gov/chemical/apps/cap). |
format | Online Article Text |
id | pubmed-3985743 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2014 |
publisher | American Chemical
Society |
record_format | MEDLINE/PubMed |
spelling | pubmed-39857432015-02-13 QSAR Modeling of Imbalanced High-Throughput Screening Data in PubChem Zakharov, Alexey V. Peach, Megan L. Sitzmann, Markus Nicklaus, Marc C. J Chem Inf Model [Image: see text] Many of the structures in PubChem are annotated with activities determined in high-throughput screening (HTS) assays. Because of the nature of these assays, the activity data are typically strongly imbalanced, with a small number of active compounds contrasting with a very large number of inactive compounds. We have used several such imbalanced PubChem HTS assays to test and develop strategies to efficiently build robust QSAR models from imbalanced data sets. Different descriptor types [Quantitative Neighborhoods of Atoms (QNA) and “biological” descriptors] were used to generate a variety of QSAR models in the program GUSAR. The models obtained were compared using external test and validation sets. We also report on our efforts to incorporate the most predictive of our models in the publicly available NCI/CADD Group Web services (http://cactus.nci.nih.gov/chemical/apps/cap). American Chemical Society 2014-02-13 2014-03-24 /pmc/articles/PMC3985743/ /pubmed/24524735 http://dx.doi.org/10.1021/ci400737s Text en Copyright © 2014 American Chemical Society |
spellingShingle | Zakharov, Alexey V. Peach, Megan L. Sitzmann, Markus Nicklaus, Marc C. QSAR Modeling of Imbalanced High-Throughput Screening Data in PubChem |
title | QSAR Modeling
of Imbalanced High-Throughput Screening
Data in PubChem |
title_full | QSAR Modeling
of Imbalanced High-Throughput Screening
Data in PubChem |
title_fullStr | QSAR Modeling
of Imbalanced High-Throughput Screening
Data in PubChem |
title_full_unstemmed | QSAR Modeling
of Imbalanced High-Throughput Screening
Data in PubChem |
title_short | QSAR Modeling
of Imbalanced High-Throughput Screening
Data in PubChem |
title_sort | qsar modeling
of imbalanced high-throughput screening
data in pubchem |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3985743/ https://www.ncbi.nlm.nih.gov/pubmed/24524735 http://dx.doi.org/10.1021/ci400737s |
work_keys_str_mv | AT zakharovalexeyv qsarmodelingofimbalancedhighthroughputscreeningdatainpubchem AT peachmeganl qsarmodelingofimbalancedhighthroughputscreeningdatainpubchem AT sitzmannmarkus qsarmodelingofimbalancedhighthroughputscreeningdatainpubchem AT nicklausmarcc qsarmodelingofimbalancedhighthroughputscreeningdatainpubchem |