Cargando…
Multiple search methods for similarity-based virtual screening: analysis of search overlap and precision
BACKGROUND: Data fusion methods are widely used in virtual screening, and make the implicit assumption that the more often a molecule is retrieved in multiple similarity searches, the more likely it is to be active. This paper tests the correctness of this assumption. RESULTS: Sets of 25 searches us...
Autores principales: | , , , |
---|---|
Formato: | Online Artículo Texto |
Lenguaje: | English |
Publicado: |
BioMed Central
2011
|
Materias: | |
Acceso en línea: | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3195112/ https://www.ncbi.nlm.nih.gov/pubmed/21824430 http://dx.doi.org/10.1186/1758-2946-3-29 |
_version_ | 1782214069677195264 |
---|---|
author | Holliday, John D Kanoulas, Evangelos Malim, Nurul Willett, Peter |
author_facet | Holliday, John D Kanoulas, Evangelos Malim, Nurul Willett, Peter |
author_sort | Holliday, John D |
collection | PubMed |
description | BACKGROUND: Data fusion methods are widely used in virtual screening, and make the implicit assumption that the more often a molecule is retrieved in multiple similarity searches, the more likely it is to be active. This paper tests the correctness of this assumption. RESULTS: Sets of 25 searches using either the same reference structure and 25 different similarity measures (similarity fusion) or 25 different reference structures and the same similarity measure (group fusion) show that large numbers of unique molecules are retrieved by just a single search, but that the numbers of unique molecules decrease very rapidly as more searches are considered. This rapid decrease is accompanied by a rapid increase in the fraction of those retrieved molecules that are active. There is an approximately log-log relationship between the numbers of different molecules retrieved and the number of searches carried out, and a rationale for this power-law behaviour is provided. CONCLUSIONS: Using multiple searches provides a simple way of increasing the precision of a similarity search, and thus provides a justification for the use of data fusion methods in virtual screening. |
format | Online Article Text |
id | pubmed-3195112 |
institution | National Center for Biotechnology Information |
language | English |
publishDate | 2011 |
publisher | BioMed Central |
record_format | MEDLINE/PubMed |
spelling | pubmed-31951122011-10-18 Multiple search methods for similarity-based virtual screening: analysis of search overlap and precision Holliday, John D Kanoulas, Evangelos Malim, Nurul Willett, Peter J Cheminform Research Article BACKGROUND: Data fusion methods are widely used in virtual screening, and make the implicit assumption that the more often a molecule is retrieved in multiple similarity searches, the more likely it is to be active. This paper tests the correctness of this assumption. RESULTS: Sets of 25 searches using either the same reference structure and 25 different similarity measures (similarity fusion) or 25 different reference structures and the same similarity measure (group fusion) show that large numbers of unique molecules are retrieved by just a single search, but that the numbers of unique molecules decrease very rapidly as more searches are considered. This rapid decrease is accompanied by a rapid increase in the fraction of those retrieved molecules that are active. There is an approximately log-log relationship between the numbers of different molecules retrieved and the number of searches carried out, and a rationale for this power-law behaviour is provided. CONCLUSIONS: Using multiple searches provides a simple way of increasing the precision of a similarity search, and thus provides a justification for the use of data fusion methods in virtual screening. BioMed Central 2011-08-08 /pmc/articles/PMC3195112/ /pubmed/21824430 http://dx.doi.org/10.1186/1758-2946-3-29 Text en Copyright ©2011 Holliday et al; licensee Chemistry Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. |
spellingShingle | Research Article Holliday, John D Kanoulas, Evangelos Malim, Nurul Willett, Peter Multiple search methods for similarity-based virtual screening: analysis of search overlap and precision |
title | Multiple search methods for similarity-based virtual screening: analysis of search overlap and precision |
title_full | Multiple search methods for similarity-based virtual screening: analysis of search overlap and precision |
title_fullStr | Multiple search methods for similarity-based virtual screening: analysis of search overlap and precision |
title_full_unstemmed | Multiple search methods for similarity-based virtual screening: analysis of search overlap and precision |
title_short | Multiple search methods for similarity-based virtual screening: analysis of search overlap and precision |
title_sort | multiple search methods for similarity-based virtual screening: analysis of search overlap and precision |
topic | Research Article |
url | https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3195112/ https://www.ncbi.nlm.nih.gov/pubmed/21824430 http://dx.doi.org/10.1186/1758-2946-3-29 |
work_keys_str_mv | AT hollidayjohnd multiplesearchmethodsforsimilaritybasedvirtualscreeninganalysisofsearchoverlapandprecision AT kanoulasevangelos multiplesearchmethodsforsimilaritybasedvirtualscreeninganalysisofsearchoverlapandprecision AT malimnurul multiplesearchmethodsforsimilaritybasedvirtualscreeninganalysisofsearchoverlapandprecision AT willettpeter multiplesearchmethodsforsimilaritybasedvirtualscreeninganalysisofsearchoverlapandprecision |