Cargando…

Benchmarking Data Sets from PubChem BioAssay Data: Current Scenario and Room for Improvement

Developing realistic data sets for evaluating virtual screening methods is a task that has been tackled by the cheminformatics community for many years. Numerous artificially constructed data collections were developed, such as DUD, DUD-E, or DEKOIS. However, they all suffer from multiple drawbacks,...

Descripción completa

Detalles Bibliográficos
Autores principales: Tran-Nguyen, Viet-Khoa, Rognan, Didier
Formato: Online Artículo Texto
Lenguaje:English
Publicado: MDPI 2020
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7352161/
https://www.ncbi.nlm.nih.gov/pubmed/32575564
http://dx.doi.org/10.3390/ijms21124380
_version_ 1783557572979064832
author Tran-Nguyen, Viet-Khoa
Rognan, Didier
author_facet Tran-Nguyen, Viet-Khoa
Rognan, Didier
author_sort Tran-Nguyen, Viet-Khoa
collection PubMed
description Developing realistic data sets for evaluating virtual screening methods is a task that has been tackled by the cheminformatics community for many years. Numerous artificially constructed data collections were developed, such as DUD, DUD-E, or DEKOIS. However, they all suffer from multiple drawbacks, one of which is the absence of experimental results confirming the impotence of presumably inactive molecules, leading to possible false negatives in the ligand sets. In light of this problem, the PubChem BioAssay database, an open-access repository providing the bioactivity information of compounds that were already tested on a biological target, is now a recommended source for data set construction. Nevertheless, there exist several issues with the use of such data that need to be properly addressed. In this article, an overview of benchmarking data collections built upon experimental PubChem BioAssay input is provided, along with a thorough discussion of noteworthy issues that one must consider during the design of new ligand sets from this database. The points raised in this review are expected to guide future developments in this regard, in hopes of offering better evaluation tools for novel in silico screening procedures.
format Online
Article
Text
id pubmed-7352161
institution National Center for Biotechnology Information
language English
publishDate 2020
publisher MDPI
record_format MEDLINE/PubMed
spelling pubmed-73521612020-07-15 Benchmarking Data Sets from PubChem BioAssay Data: Current Scenario and Room for Improvement Tran-Nguyen, Viet-Khoa Rognan, Didier Int J Mol Sci Review Developing realistic data sets for evaluating virtual screening methods is a task that has been tackled by the cheminformatics community for many years. Numerous artificially constructed data collections were developed, such as DUD, DUD-E, or DEKOIS. However, they all suffer from multiple drawbacks, one of which is the absence of experimental results confirming the impotence of presumably inactive molecules, leading to possible false negatives in the ligand sets. In light of this problem, the PubChem BioAssay database, an open-access repository providing the bioactivity information of compounds that were already tested on a biological target, is now a recommended source for data set construction. Nevertheless, there exist several issues with the use of such data that need to be properly addressed. In this article, an overview of benchmarking data collections built upon experimental PubChem BioAssay input is provided, along with a thorough discussion of noteworthy issues that one must consider during the design of new ligand sets from this database. The points raised in this review are expected to guide future developments in this regard, in hopes of offering better evaluation tools for novel in silico screening procedures. MDPI 2020-06-19 /pmc/articles/PMC7352161/ /pubmed/32575564 http://dx.doi.org/10.3390/ijms21124380 Text en © 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
spellingShingle Review
Tran-Nguyen, Viet-Khoa
Rognan, Didier
Benchmarking Data Sets from PubChem BioAssay Data: Current Scenario and Room for Improvement
title Benchmarking Data Sets from PubChem BioAssay Data: Current Scenario and Room for Improvement
title_full Benchmarking Data Sets from PubChem BioAssay Data: Current Scenario and Room for Improvement
title_fullStr Benchmarking Data Sets from PubChem BioAssay Data: Current Scenario and Room for Improvement
title_full_unstemmed Benchmarking Data Sets from PubChem BioAssay Data: Current Scenario and Room for Improvement
title_short Benchmarking Data Sets from PubChem BioAssay Data: Current Scenario and Room for Improvement
title_sort benchmarking data sets from pubchem bioassay data: current scenario and room for improvement
topic Review
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7352161/
https://www.ncbi.nlm.nih.gov/pubmed/32575564
http://dx.doi.org/10.3390/ijms21124380
work_keys_str_mv AT trannguyenvietkhoa benchmarkingdatasetsfrompubchembioassaydatacurrentscenarioandroomforimprovement
AT rognandidier benchmarkingdatasetsfrompubchembioassaydatacurrentscenarioandroomforimprovement