Cargando…

Multiple search methods for similarity-based virtual screening: analysis of search overlap and precision

BACKGROUND: Data fusion methods are widely used in virtual screening, and make the implicit assumption that the more often a molecule is retrieved in multiple similarity searches, the more likely it is to be active. This paper tests the correctness of this assumption. RESULTS: Sets of 25 searches us...

Descripción completa

Detalles Bibliográficos
Autores principales: Holliday, John D, Kanoulas, Evangelos, Malim, Nurul, Willett, Peter
Formato: Online Artículo Texto
Lenguaje:English
Publicado: BioMed Central 2011
Materias:
Acceso en línea:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3195112/
https://www.ncbi.nlm.nih.gov/pubmed/21824430
http://dx.doi.org/10.1186/1758-2946-3-29
_version_ 1782214069677195264
author Holliday, John D
Kanoulas, Evangelos
Malim, Nurul
Willett, Peter
author_facet Holliday, John D
Kanoulas, Evangelos
Malim, Nurul
Willett, Peter
author_sort Holliday, John D
collection PubMed
description BACKGROUND: Data fusion methods are widely used in virtual screening, and make the implicit assumption that the more often a molecule is retrieved in multiple similarity searches, the more likely it is to be active. This paper tests the correctness of this assumption. RESULTS: Sets of 25 searches using either the same reference structure and 25 different similarity measures (similarity fusion) or 25 different reference structures and the same similarity measure (group fusion) show that large numbers of unique molecules are retrieved by just a single search, but that the numbers of unique molecules decrease very rapidly as more searches are considered. This rapid decrease is accompanied by a rapid increase in the fraction of those retrieved molecules that are active. There is an approximately log-log relationship between the numbers of different molecules retrieved and the number of searches carried out, and a rationale for this power-law behaviour is provided. CONCLUSIONS: Using multiple searches provides a simple way of increasing the precision of a similarity search, and thus provides a justification for the use of data fusion methods in virtual screening.
format Online
Article
Text
id pubmed-3195112
institution National Center for Biotechnology Information
language English
publishDate 2011
publisher BioMed Central
record_format MEDLINE/PubMed
spelling pubmed-31951122011-10-18 Multiple search methods for similarity-based virtual screening: analysis of search overlap and precision Holliday, John D Kanoulas, Evangelos Malim, Nurul Willett, Peter J Cheminform Research Article BACKGROUND: Data fusion methods are widely used in virtual screening, and make the implicit assumption that the more often a molecule is retrieved in multiple similarity searches, the more likely it is to be active. This paper tests the correctness of this assumption. RESULTS: Sets of 25 searches using either the same reference structure and 25 different similarity measures (similarity fusion) or 25 different reference structures and the same similarity measure (group fusion) show that large numbers of unique molecules are retrieved by just a single search, but that the numbers of unique molecules decrease very rapidly as more searches are considered. This rapid decrease is accompanied by a rapid increase in the fraction of those retrieved molecules that are active. There is an approximately log-log relationship between the numbers of different molecules retrieved and the number of searches carried out, and a rationale for this power-law behaviour is provided. CONCLUSIONS: Using multiple searches provides a simple way of increasing the precision of a similarity search, and thus provides a justification for the use of data fusion methods in virtual screening. BioMed Central 2011-08-08 /pmc/articles/PMC3195112/ /pubmed/21824430 http://dx.doi.org/10.1186/1758-2946-3-29 Text en Copyright ©2011 Holliday et al; licensee Chemistry Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle Research Article
Holliday, John D
Kanoulas, Evangelos
Malim, Nurul
Willett, Peter
Multiple search methods for similarity-based virtual screening: analysis of search overlap and precision
title Multiple search methods for similarity-based virtual screening: analysis of search overlap and precision
title_full Multiple search methods for similarity-based virtual screening: analysis of search overlap and precision
title_fullStr Multiple search methods for similarity-based virtual screening: analysis of search overlap and precision
title_full_unstemmed Multiple search methods for similarity-based virtual screening: analysis of search overlap and precision
title_short Multiple search methods for similarity-based virtual screening: analysis of search overlap and precision
title_sort multiple search methods for similarity-based virtual screening: analysis of search overlap and precision
topic Research Article
url https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3195112/
https://www.ncbi.nlm.nih.gov/pubmed/21824430
http://dx.doi.org/10.1186/1758-2946-3-29
work_keys_str_mv AT hollidayjohnd multiplesearchmethodsforsimilaritybasedvirtualscreeninganalysisofsearchoverlapandprecision
AT kanoulasevangelos multiplesearchmethodsforsimilaritybasedvirtualscreeninganalysisofsearchoverlapandprecision
AT malimnurul multiplesearchmethodsforsimilaritybasedvirtualscreeninganalysisofsearchoverlapandprecision
AT willettpeter multiplesearchmethodsforsimilaritybasedvirtualscreeninganalysisofsearchoverlapandprecision