Cargando…
High-Throughput Screening Assay Datasets from the PubChem Database
Availability of high-throughput screening (HTS) data in the public domain offers great potential to foster development of ligand-based computer-aided drug discovery (LB-CADD) methods crucial for drug discovery efforts in academia and industry. LB-CADD method development depends on high-quality HTS a...
Autores principales: | , , , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
2017
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5962024/ https://www.ncbi.nlm.nih.gov/pubmed/29795804 |
_version_ | 1783324826306347008 |
---|---|
author | Butkiewicz, Mariusz Wang, Yanli Bryant, Stephen H Lowe, Edward W Weaver, David C Meiler, Jens |
author_facet | Butkiewicz, Mariusz Wang, Yanli Bryant, Stephen H Lowe, Edward W Weaver, David C Meiler, Jens |
author_sort | Butkiewicz, Mariusz |
collection | PubMed |
description | Availability of high-throughput screening (HTS) data in the public domain offers great potential to foster development of ligand-based computer-aided drug discovery (LB-CADD) methods crucial for drug discovery efforts in academia and industry. LB-CADD method development depends on high-quality HTS assay data, i.e., datasets that contain both active and inactive compounds. These active compounds are hits from primary screens that have been tested in concentration-response experiments and where the target-specificity of the hits has been validated through suitable secondary screening experiments. Publicly available HTS repositories such as PubChem often provide such data in a convoluted way: compounds that are classified as inactive need to be extracted from the primary screening record. However, compounds classified as active in the primary screening record are not suitable as a set of active compounds for LB-CADD experiments due to high false-positive rate. A suitable set of actives can be derived by carefully analysing results in often up to five or more assays that are used to confirm and classify the activity of compounds. These assays, in part, build on each other. However, often not all hit compounds from the previous screen have been tested. Sometimes a compound can be classified as ‘active’, though its meaning is ‘inactive’ on the target of interest as it is ‘active’ on a different target protein. Here, a curation process of hierarchically related confirmatory screens is illustrated based on two specifically chosen protein use-cases. The subsequent re-upload procedure into PubChem is described for the findings of those two scenarios. Further, we provide nine publicly accessible high quality datasets for future LB-CADD method development that provide a common baseline for comparison of future methods to the scientific community. We also provide a protocol researchers can follow to upload additional datasets for benchmarking. |
format | Online Article Text |
id | pubmed-5962024 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2017 |
record_format | MEDLINE/PubMed |
spelling | pubmed-59620242018-05-21 High-Throughput Screening Assay Datasets from the PubChem Database Butkiewicz, Mariusz Wang, Yanli Bryant, Stephen H Lowe, Edward W Weaver, David C Meiler, Jens Chem Inform Article Availability of high-throughput screening (HTS) data in the public domain offers great potential to foster development of ligand-based computer-aided drug discovery (LB-CADD) methods crucial for drug discovery efforts in academia and industry. LB-CADD method development depends on high-quality HTS assay data, i.e., datasets that contain both active and inactive compounds. These active compounds are hits from primary screens that have been tested in concentration-response experiments and where the target-specificity of the hits has been validated through suitable secondary screening experiments. Publicly available HTS repositories such as PubChem often provide such data in a convoluted way: compounds that are classified as inactive need to be extracted from the primary screening record. However, compounds classified as active in the primary screening record are not suitable as a set of active compounds for LB-CADD experiments due to high false-positive rate. A suitable set of actives can be derived by carefully analysing results in often up to five or more assays that are used to confirm and classify the activity of compounds. These assays, in part, build on each other. However, often not all hit compounds from the previous screen have been tested. Sometimes a compound can be classified as ‘active’, though its meaning is ‘inactive’ on the target of interest as it is ‘active’ on a different target protein. Here, a curation process of hierarchically related confirmatory screens is illustrated based on two specifically chosen protein use-cases. The subsequent re-upload procedure into PubChem is described for the findings of those two scenarios. Further, we provide nine publicly accessible high quality datasets for future LB-CADD method development that provide a common baseline for comparison of future methods to the scientific community. We also provide a protocol researchers can follow to upload additional datasets for benchmarking. 2017-04-26 2017 /pmc/articles/PMC5962024/ /pubmed/29795804 Text en © Under License of Creative Commons (http://creativecommons.org/licenses/by-nc-nd/4.0/) Attribution 3.0 License |
spellingShingle | Article Butkiewicz, Mariusz Wang, Yanli Bryant, Stephen H Lowe, Edward W Weaver, David C Meiler, Jens High-Throughput Screening Assay Datasets from the PubChem Database |
title | High-Throughput Screening Assay Datasets from the PubChem Database |
title_full | High-Throughput Screening Assay Datasets from the PubChem Database |
title_fullStr | High-Throughput Screening Assay Datasets from the PubChem Database |
title_full_unstemmed | High-Throughput Screening Assay Datasets from the PubChem Database |
title_short | High-Throughput Screening Assay Datasets from the PubChem Database |
title_sort | high-throughput screening assay datasets from the pubchem database |
topic | Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5962024/ https://www.ncbi.nlm.nih.gov/pubmed/29795804 |
work_keys_str_mv | AT butkiewiczmariusz highthroughputscreeningassaydatasetsfromthepubchemdatabase AT wangyanli highthroughputscreeningassaydatasetsfromthepubchemdatabase AT bryantstephenh highthroughputscreeningassaydatasetsfromthepubchemdatabase AT loweedwardw highthroughputscreeningassaydatasetsfromthepubchemdatabase AT weaverdavidc highthroughputscreeningassaydatasetsfromthepubchemdatabase AT meilerjens highthroughputscreeningassaydatasetsfromthepubchemdatabase |