Cargando…

High-Throughput Screening Assay Datasets from the PubChem Database

Availability of high-throughput screening (HTS) data in the public domain offers great potential to foster development of ligand-based computer-aided drug discovery (LB-CADD) methods crucial for drug discovery efforts in academia and industry. LB-CADD method development depends on high-quality HTS a...

Descripción completa

Detalles Bibliográficos
Autores principales: Butkiewicz, Mariusz, Wang, Yanli, Bryant, Stephen H, Lowe, Edward W, Weaver, David C, Meiler, Jens
Formato: Online Artículo Texto
Lenguaje:English
Publicado: 2017
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5962024/
https://www.ncbi.nlm.nih.gov/pubmed/29795804
_version_ 1783324826306347008
author Butkiewicz, Mariusz
Wang, Yanli
Bryant, Stephen H
Lowe, Edward W
Weaver, David C
Meiler, Jens
author_facet Butkiewicz, Mariusz
Wang, Yanli
Bryant, Stephen H
Lowe, Edward W
Weaver, David C
Meiler, Jens
author_sort Butkiewicz, Mariusz
collection PubMed
description Availability of high-throughput screening (HTS) data in the public domain offers great potential to foster development of ligand-based computer-aided drug discovery (LB-CADD) methods crucial for drug discovery efforts in academia and industry. LB-CADD method development depends on high-quality HTS assay data, i.e., datasets that contain both active and inactive compounds. These active compounds are hits from primary screens that have been tested in concentration-response experiments and where the target-specificity of the hits has been validated through suitable secondary screening experiments. Publicly available HTS repositories such as PubChem often provide such data in a convoluted way: compounds that are classified as inactive need to be extracted from the primary screening record. However, compounds classified as active in the primary screening record are not suitable as a set of active compounds for LB-CADD experiments due to high false-positive rate. A suitable set of actives can be derived by carefully analysing results in often up to five or more assays that are used to confirm and classify the activity of compounds. These assays, in part, build on each other. However, often not all hit compounds from the previous screen have been tested. Sometimes a compound can be classified as ‘active’, though its meaning is ‘inactive’ on the target of interest as it is ‘active’ on a different target protein. Here, a curation process of hierarchically related confirmatory screens is illustrated based on two specifically chosen protein use-cases. The subsequent re-upload procedure into PubChem is described for the findings of those two scenarios. Further, we provide nine publicly accessible high quality datasets for future LB-CADD method development that provide a common baseline for comparison of future methods to the scientific community. We also provide a protocol researchers can follow to upload additional datasets for benchmarking.
format Online
Article
Text
id pubmed-5962024
institution National Center for Biotechnology Information
language English
publishDate 2017
record_format MEDLINE/PubMed
spelling pubmed-59620242018-05-21 High-Throughput Screening Assay Datasets from the PubChem Database Butkiewicz, Mariusz Wang, Yanli Bryant, Stephen H Lowe, Edward W Weaver, David C Meiler, Jens Chem Inform Article Availability of high-throughput screening (HTS) data in the public domain offers great potential to foster development of ligand-based computer-aided drug discovery (LB-CADD) methods crucial for drug discovery efforts in academia and industry. LB-CADD method development depends on high-quality HTS assay data, i.e., datasets that contain both active and inactive compounds. These active compounds are hits from primary screens that have been tested in concentration-response experiments and where the target-specificity of the hits has been validated through suitable secondary screening experiments. Publicly available HTS repositories such as PubChem often provide such data in a convoluted way: compounds that are classified as inactive need to be extracted from the primary screening record. However, compounds classified as active in the primary screening record are not suitable as a set of active compounds for LB-CADD experiments due to high false-positive rate. A suitable set of actives can be derived by carefully analysing results in often up to five or more assays that are used to confirm and classify the activity of compounds. These assays, in part, build on each other. However, often not all hit compounds from the previous screen have been tested. Sometimes a compound can be classified as ‘active’, though its meaning is ‘inactive’ on the target of interest as it is ‘active’ on a different target protein. Here, a curation process of hierarchically related confirmatory screens is illustrated based on two specifically chosen protein use-cases. The subsequent re-upload procedure into PubChem is described for the findings of those two scenarios. Further, we provide nine publicly accessible high quality datasets for future LB-CADD method development that provide a common baseline for comparison of future methods to the scientific community. We also provide a protocol researchers can follow to upload additional datasets for benchmarking. 2017-04-26 2017 /pmc/articles/PMC5962024/ /pubmed/29795804 Text en © Under License of Creative Commons (http://creativecommons.org/licenses/by-nc-nd/4.0/) Attribution 3.0 License
spellingShingle Article
Butkiewicz, Mariusz
Wang, Yanli
Bryant, Stephen H
Lowe, Edward W
Weaver, David C
Meiler, Jens
High-Throughput Screening Assay Datasets from the PubChem Database
title High-Throughput Screening Assay Datasets from the PubChem Database
title_full High-Throughput Screening Assay Datasets from the PubChem Database
title_fullStr High-Throughput Screening Assay Datasets from the PubChem Database
title_full_unstemmed High-Throughput Screening Assay Datasets from the PubChem Database
title_short High-Throughput Screening Assay Datasets from the PubChem Database
title_sort high-throughput screening assay datasets from the pubchem database
topic Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5962024/
https://www.ncbi.nlm.nih.gov/pubmed/29795804
work_keys_str_mv AT butkiewiczmariusz highthroughputscreeningassaydatasetsfromthepubchemdatabase
AT wangyanli highthroughputscreeningassaydatasetsfromthepubchemdatabase
AT bryantstephenh highthroughputscreeningassaydatasetsfromthepubchemdatabase
AT loweedwardw highthroughputscreeningassaydatasetsfromthepubchemdatabase
AT weaverdavidc highthroughputscreeningassaydatasetsfromthepubchemdatabase
AT meilerjens highthroughputscreeningassaydatasetsfromthepubchemdatabase