Cargando…

Application of kernel functions for accurate similarity search in large chemical databases

BACKGROUND: Similaritysearch in chemical structure databases is an important problem with many applications in chemical genomics, drug design, and efficient chemical probe screening among others. It is widely believed that structure based methods provide an efficient way to do the query. Recently va...

Descripción completa

Detalles Bibliográficos
Autores principales:	Wang, Xiaohong, Huan, Jun, Smalter, Aaron, Lushington, Gerald H
Formato:	Texto
Lenguaje:	English
Publicado:	BioMed Central 2010
Materias:	Proceedings
Acceso en línea:	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2863067/ https://www.ncbi.nlm.nih.gov/pubmed/20438655 http://dx.doi.org/10.1186/1471-2105-11-S3-S8

_version_	1782180743063011328
author	Wang, Xiaohong Huan, Jun Smalter, Aaron Lushington, Gerald H
author_facet	Wang, Xiaohong Huan, Jun Smalter, Aaron Lushington, Gerald H
author_sort	Wang, Xiaohong
collection	PubMed
description	BACKGROUND: Similaritysearch in chemical structure databases is an important problem with many applications in chemical genomics, drug design, and efficient chemical probe screening among others. It is widely believed that structure based methods provide an efficient way to do the query. Recently various graph kernel functions have been designed to capture the intrinsic similarity of graphs. Though successful in constructing accurate predictive and classification models, graph kernel functions can not be applied to large chemical compound database due to the high computational complexity and the difficulties in indexing similarity search for large databases. RESULTS: To bridge graph kernel function and similarity search in chemical databases, we applied a novel kernel-based similarity measurement, developed in our team, to measure similarity of graph represented chemicals. In our method, we utilize a hash table to support new graph kernel function definition, efficient storage and fast search. We have applied our method, named G-hash, to large chemical databases. Our results show that the G-hash method achieves state-of-the-art performance for k-nearest neighbor (k-NN) classification. Moreover, the similarity measurement and the index structure is scalable to large chemical databases with smaller indexing size, and faster query processing time as compared to state-of-the-art indexing methods such as Daylight fingerprints, C-tree and GraphGrep. CONCLUSIONS: Efficient similarity query processing method for large chemical databases is challenging since we need to balance running time efficiency and similarity search accuracy. Our previous similarity search method, G-hash, provides a new way to perform similarity search in chemical databases. Experimental study validates the utility of G-hash in chemical databases.
format	Text
id	pubmed-2863067
institution	National Center for Biotechnology Information
language	English
publishDate	2010
publisher	BioMed Central
record_format	MEDLINE/PubMed
spelling	pubmed-28630672010-05-04 Application of kernel functions for accurate similarity search in large chemical databases Wang, Xiaohong Huan, Jun Smalter, Aaron Lushington, Gerald H BMC Bioinformatics Proceedings BACKGROUND: Similaritysearch in chemical structure databases is an important problem with many applications in chemical genomics, drug design, and efficient chemical probe screening among others. It is widely believed that structure based methods provide an efficient way to do the query. Recently various graph kernel functions have been designed to capture the intrinsic similarity of graphs. Though successful in constructing accurate predictive and classification models, graph kernel functions can not be applied to large chemical compound database due to the high computational complexity and the difficulties in indexing similarity search for large databases. RESULTS: To bridge graph kernel function and similarity search in chemical databases, we applied a novel kernel-based similarity measurement, developed in our team, to measure similarity of graph represented chemicals. In our method, we utilize a hash table to support new graph kernel function definition, efficient storage and fast search. We have applied our method, named G-hash, to large chemical databases. Our results show that the G-hash method achieves state-of-the-art performance for k-nearest neighbor (k-NN) classification. Moreover, the similarity measurement and the index structure is scalable to large chemical databases with smaller indexing size, and faster query processing time as compared to state-of-the-art indexing methods such as Daylight fingerprints, C-tree and GraphGrep. CONCLUSIONS: Efficient similarity query processing method for large chemical databases is challenging since we need to balance running time efficiency and similarity search accuracy. Our previous similarity search method, G-hash, provides a new way to perform similarity search in chemical databases. Experimental study validates the utility of G-hash in chemical databases. BioMed Central 2010-04-29 /pmc/articles/PMC2863067/ /pubmed/20438655 http://dx.doi.org/10.1186/1471-2105-11-S3-S8 Text en Copyright ©2010 Huan et al; licensee BioMed Central Ltd. http://creativecommons.org/licenses/by/2.0 This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.
spellingShingle	Proceedings Wang, Xiaohong Huan, Jun Smalter, Aaron Lushington, Gerald H Application of kernel functions for accurate similarity search in large chemical databases
title	Application of kernel functions for accurate similarity search in large chemical databases
title_full	Application of kernel functions for accurate similarity search in large chemical databases
title_fullStr	Application of kernel functions for accurate similarity search in large chemical databases
title_full_unstemmed	Application of kernel functions for accurate similarity search in large chemical databases
title_short	Application of kernel functions for accurate similarity search in large chemical databases
title_sort	application of kernel functions for accurate similarity search in large chemical databases
topic	Proceedings
url	https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2863067/ https://www.ncbi.nlm.nih.gov/pubmed/20438655 http://dx.doi.org/10.1186/1471-2105-11-S3-S8
work_keys_str_mv	AT wangxiaohong applicationofkernelfunctionsforaccuratesimilaritysearchinlargechemicaldatabases AT huanjun applicationofkernelfunctionsforaccuratesimilaritysearchinlargechemicaldatabases AT smalteraaron applicationofkernelfunctionsforaccuratesimilaritysearchinlargechemicaldatabases AT lushingtongeraldh applicationofkernelfunctionsforaccuratesimilaritysearchinlargechemicaldatabases

Application of kernel functions for accurate similarity search in large chemical databases

Ejemplares similares